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WILLIAM SEALY GOSSET, 


The two appreciations which follow have been written from somewhat 
different angles. The first is by a younger colleague and friend at the St James’ 
Gate Brewery, who for a number of years was in close contact with Gosset in 
Dublin, both in and out of the brewery. The friendship of the second writer is one 
which grew through a correspondence that roved at length over statistical 
methods and theories. If in some places the articles overlap, this will only 
help to emphasize certain events or characteristics which independently we 
have felt impelled to record. 

Both of us would like to express our warmest thanks to the many friends 
who have helped us, and in particular to Mrs W. 8. Gosset and Mr E. Somerfield. 


L. MoM., E. 8. P. 


(1) “STUDENT” AS A MAN 
By L. McMULLEN 


WiiLt1AM SEALY GOSSET was the eldest son of Colonel Frederic Gosset, R.E., 
and was born at Canterbury in 1876. In 1906 he married Marjory Surtees 
Phillpotts, daughter of the late headmaster of Bedford School, and they had 
one son and two daughters. He died on 16 October 1937, and was survived by 
both his parents, his wife and children, and one grandson. 

He was educated at Winchester, where he was a scholar, and New College, 
Oxford, where he studied chemistry and mathematics. 

He entered the service of Messrs Guinness as a brewer in 1899. 

It is not known exactly how or when “‘Student’s” interest in statistics was 
first aroused, but at this period scientific methods and laboratory determinations 
were beginning to be seriously applied to brewing, and it is obvious that some 
knowledge of error functions would be necessary. A number of university men 
with science degrees had been taken on, and it is probable that “Student”, who 
was the most mathematical of them, was appealed to by the others with various 
questions and so began to study the subject. It is known that he could calculate 
a probable error in 1903. The circumstances of brewing work, with its variable 
materials and susceptibility to temperature change and necessarily short series 
of experiments, are all such as to show up most rapidly the limitations of large 


sample theory and emphasize the necessity for a correct method of treating 
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small samples. It was thus no accident, but the circumstances of his work, that 
directed ‘‘Student’s” attention to this problem, and so led to his discovery of 
the distribution of the sample standard deviation, which gave rise to what in 
its modern form is known as the ¢-test. For a long time after its discovery and 
publication the use of this test hardly spread outside Guinness’s brewery, where 
it has been very extensively used ever since. In the Biometric school at 
University College the problems investigated were almost all concerned with 
much larger samples than those in which “studentizing”’, as it was sometimes 
called, made any difference. Nevertheless, although their lines of research 
diverged somewhatrapidly, the close statistical contact and personal friendship 
between Karl Pearson and ‘‘Student”’, which began during his year at University 
College, were only terminated by death. 

The purpose of this note is not however to give an account of ‘“‘Student’s”’ 
statistical work, but to try to give a more general impression of the man himself. 
Although his public reputation was entirely as a statistician, and he was 
acknowledged to be one of the leading investigators in that subject, his time was 
never wholly and rarely even mainly occupied with statistical matters. For one 
who saw enough of him to know roughly how his time was spent both at work 
and at home, it was very difficult to understand how he managed to get so 
much activity into the day. At work he got through an enormous amount of 
the ordinary routine of the brewery, as well as his statistics. Until 1922 he had 
no regular statistical assistant, and did all the statistics and most of the 
arithmetic himself; later there was a definite department, of which he was in 
charge till 1934, but throughout he did a great deal of arithmetic and spade- 
work himseif. It might be supposed from the amount he did in the time that 
he was unusually good at arithmetic and the arrangement of work; such, 
however, was not the case, for his arithmetic frequently contained minor errors. 
In one of his obituary notices a tendency to do work on the backs of envelopes 
in trains was mentioned, but this tendency was not confined to trains; even in 
his office much work was done on random scraps of paper. He also had a great 
dislike of the tabulation of results, and preferred to do everything from first 
principles whenever possible. This preference led in certain instances to waste 
of time in routine work, but was of assistance in maintaining that flexibility and 
speed of attack on new problems which was so characteristic of him. An actual 
example would need too much explanation of relevant circumstances, but I can 
vouch for the analogical truth of the following. If a body performs simple 
harmonic motion with acceleration ~ per unit displacement, it may readily be 
shown that the period of a complete oscillation is 277/,/u. Hence, in the case of 
a simple pendulum ¢=2z7,,/(l/g) and 1=gt?/4m?, where / is the length of the 
pendulum and g the acceleration due to gravity. If it were necessary to calculate 
the lengths of pendulum corresponding to different periods as a routine matter, 
most people would evaluate g/47? for their locality and always multiply @ by 
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this numerical constant, which would be about 24-85. ‘“‘Student” would probably 
have started from 27/,/u every time. If therefore he had suddenly wanted to 
calculate the period of oscillation of a weight on a stretched spring he could 
have done it, whereas the man who only remembered that /=24-85¢? for a 
pendulum would be unable to tackle the problem without much more pre- 
liminary work. 

His method was, of course, not necessarily the most suitable for others not 
aspiring to the same degree of versatility. Perhaps it is not altogether fanciful 
to compare the two methods with the organic evolution of, say, the human hand, 
the most versatile object known, and the construction of some highly efficient 
but absolutely specialized piece of machinery. I do not mean to imply that he 
gave this explanation, or was even altogether conscious of it. When he handed 
over to me a routine calculation which he had done for many years, I was 
astonished to find that he had written out every week an almost unvarying form 
of words with different figures. To my question, ““Why ever don’t you get a 
printed form?” he did not reply, “Doing it from first principles every time 
preserves mental flexibility”. He would have considered such a remark un- 
bearably pompous. He said, ‘‘Because I’m too lazy”, to which I replied, 
“Well, ’'m too lazy not to.” 

To many in the statistical world “Student” was regarded as a statistical 
adviser to Guinness’s brewery; to others he appeared to be a brewer devoting 
his spare time to statistics. I have tried to show that though there is some 
truth in both of these ideas they miss the central point, which was the intimate 
connexion between his statistical research and the practical problems on which 
he was engaged. I can imagine that many think it wasteful that a man of his 
undoubted genius should have been engaged in industry, yet I am sure that it is 
just that association with immediate practical problems which gives ‘‘Student’s” 
work its unique character and importance relative to its small volume. On at 
least one occasion he was offered an academic appointment, but it is almost 
certain that he would not have been a successful lecturer, though perhaps a good 
individual teacher; nor is it likely that his research work would have flourished 
in more academic circumstances; his mind worked in a different way. 

The work in connexion with barley breeding carried out by the Department 
of Agriculture in Ireland, in which Messrs Guinness took a prominent part, 
enabled ‘“‘Student” to get that first-hand experience of yield trials and agricul- 
tural exneriments generally which contributed so largely to his great knowledge 
of the subject. He did not merely sit in his office and calculate the results, but 
discussed all the details and difficulties with the Department officials, and went 
round all the experiments before harvest, when a “grand tour” is annually 
carried out by the Department, the brewery, and sometimes statisticians or 
others interested from England or abroad. As well as the work carried out at the 
actual cereal station near Cork, three or four varieties of barley are grown in 
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? or 1 acre plots at ten farms representing all the principal barley-growing 
districts of Ireland, so a visit to all of them entails a fairly comprehensive 
inspection of the crops. 

“Student” took a great deal of interest in this work from the beginning and 
correspondence shows that he discussed the results of these tests with Karl 
Pearson at great length when he went to study with him at University College 
in 1906. 

In the last ten years or so of his time in Ireland he played a leading part in 
these investigations, and thus had a perhaps unique opportunity of following 
experimental varieties from sowing through growing and harvest to malting 
and brewing results, and also of carrying out or supervising all the relevant 
mathematical work. At one time he also made some barley crosses in his own 
garden, and accelerated their multiplication by having one generation grown 
in New Zealand during our winter. These crosses were known as Student I and 
IT, and have now been discarded as failures, the inevitable fate of the large 
majority. With characteristic self-effacement he was the first to point out that 
they were not worth going on with. 

He also made frequent visits to Dr E. 8. Beaven, whose work on barley 
breeding is well known, and discussed every aspect of yield trials with him. 
These visits were undoubtedly very useful, and although Dr Beaven is never 
tired of protesting that he is no mathematician and does not understand 
“magic squares” or “birds of freedom”, which he prefers to the more orthodox 
expressions, he has a vast experience of agricultural trials and is very quick to 
see the weak point of any experiment. 

In spite of the quantity of work “‘Student” did he was never in a hurry or 
fussed ; this was largely due to the absence of lag when he turned his mind to a 
new subject; unfortunately others were not always equal to this. He would 
ring one up on the phone and plunge straight into some subject which might 
have been discussed some days previously. The slower-witted listener would 
probably lose the thread of his discourse before realizing what it was about and 
would ignominiously have to ask him to begin again. I have many times seen 
him hard at it on a Monday morning, but at first meeting it was always ““How 
did the sailing go?” ‘‘ Well, did you catch any fish?”’, and he would recount any 
notable event of his own week-end before plunging into the very middle of some 
subject. I never heard him say ‘I’m busy”. 

“Student” had many correspondents, mostly agricultural and other ex- 
perimenters, in different parts of the world. He took immense pains with these 
and often explained points to them at great length when he could easily have 
given a reference. His letters contain some of his clearest writing, and the more 
difficult points are often better elucidated than in his published papers. 

Karl Pearson emphasized the fact that a statistician must advise others on 
their own subject, and so may incur the accusation of butting in without 
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adequate knowledge. “‘Student” was particularly expert at avoiding any such 
disagreement ; usually he was such an enthusiastic learner of the other’s subject 
that the fact that he was giving advice escaped notice. 

The reader will by now have realized that “Student” did a very large quantity 
of ordinary routine as well as his statistical work in the brewery, and all that in 
addition to consultative statistical work and to preparing his various published 
papers. It might thus be thought that he could have done nothing else but eat 
and sleep when at home; this, however, was far from being the case, and he had 
a great many domestic and sporting interests. He was a keen fruit-grower and 
specialized in pears. He was also a good carpentez, and built a number of boats; 
the last, which was completed in 1932, and on whose maiden voyage I had the 
honour to be nearly frozen to death, was equipped with a rudder at each end 
by means of which the direction and speed of drift could be adjusted—an 
advantage which will be readily appreciated by fly-fishermen. This boat with 
its arrangement of rudders was described in the Field of 28 March 1936. In his 
carpentry he showed preferences analogous to his mathematical ones previously 
mentioned ; he disliked complicated or specific tools, and liked to do anything 
possible with a pen-knife. On one occasion, seeing him countersinking screw- 
holes with a pocket-knife, I offered him a proper countersink bit which I had 
with me, but he declined it with some embarrassment, as he would not have 
liked to explain or perhaps could not have explained why he preferred using the 
pen-knife. Out of doors he was an energetic walker and also cycled extensively 
in the pre-war period. He did a lot of sailing and fishing. For his last boat he 
had a most unconventional sail, which cannot be exactly described under any 
of the usual categories; it was illustrated in the Field article referred to above. 

In fishing he was an efficient performer; he used to hold that only the size 
and general lightness or darkness of a fly were important; the blue wings, red 
tails and so on being only to attract the fisherman to the shop. This view was 
more revolutionary when I first heard it than it is now. He was a sound though 
not spectacular shot, and was well above the average on skates. Until the 
accident to his leg in 1934 he was quite a regular golfer, and once went round a 
fairly difficult course in 85 strokes and 1} hours by himself. He used a remarkable 
collection of old clubs dating at least from the beginning of the century. In the 
last few years since his accident he took up bowls with great keenness, and 
induced many other people to play as well. One of his last visits to Ireland was 
with a team which he had organized at the new brewery at Park Royal. 

On top of all this he knew as much as most people of the affairs of the world 
in general and of what was going on about him. It became very difficult to 
imagine how he found 24 hours in any way a sufficient length for the day. His 
wife certainly organized things so that the minimum amount of time was wasted, 
but even so few people could approach such activity in quantity or diversity. 

In personal relationships he was very kindly and tolerant and absolutely 
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devoid of malice. He rarely spoke about personal matters but when he did his 
opinion was well worth listening to and not in the least superficial. 

In the summer of 1934 he had a motor accident and broke the neck of his 
femur. He had to lie up for three months, of course working at statistics, and was 
a semi-cripple for a year. This was particularly irksome for such an active man, 
as was the sheer unnecessariness of the accident, for he ran into a lamp-post on a 
straight road, through looking down to adjust some stuff he was carrying; but 
with great hard work and persistence he eventually reduced the disability to a 
slight limp. 

At the end of -1935 he left Ireland to take charge of the new Guinness 
brewery in London, and I saw comparatively little of him after that. The 
departure from Ireland of “‘Student” and his family was a great loss to many 
who had experienced their hospitality. . 

His work in London was necessarily very hard and accompanied by all the 
vexations inevitably associated with a big undertaking in its first stages, before 
any settled routine has been established; nevertheless, he still found time to 
continue his statistical work and wrote several papers. 

His death at the comparatively early age of 61 was not only a heavy blow to 
his family and friends, but a great loss to statistics, as his mind retained its full 
vigour, and he would undoubtedly have continued to work for many more years. 

Tam painfully conscious of the inadequacy of this sketch, which cannot hope 
to convey more than a faint impression of his unique personal quality to those 
who did not know him, but it will have served its purpose if it helps any readers 


to grasp the essential unity and directness of the personality which lay behind 
such widely varied manifestations. 


(2) “STUDENT” AS STATISTICIAN 
By E. 8. PEARSON 


For many years after the publication of his first paper in Biometrika, in 1907, 
the name of ‘‘Student”’ was associated in statistical circles with an atmosphere 
of romance. Those who knew him only through his written contributions 
must often have wondered who was this unassuming man, content to remain 
anonymous, who wrote so clearly and simply on so wide a range of fundamental 
topics. To those of us who came into touch with him personally, the knowledge 
that “Student” was W. 8. Gosset did not altogether dispel that romantic 
impression. Here, in London, he would pay us visits from time to time at the 
old Biometric Laboratory on his way to Euston station to catch the Irish mail; 
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he would be wearing the grey flannel trousers that were a tradition of his 
Wykehamist schooldays and carrying a rucksack on his back. And then after a 
short hour’s talk, perhaps on statistical subjects, perhaps on his garden experi- 
ments in cross-breeding, he would be off again to that wild Ireland where, in 
the “bad times”, we had heard that gunmen were to be found hiding behind his 
hedges or even searching his house for arms. We had heard too of great exploits 
by members of his family of an entirely non-statistical character, of their boat- 
building and of their construction of a pair of water-skis which they used for 
walking over Kingstown Harbour. 

My one short winter visit to Gosset’s house at Blackrock, a few miles outside 
Dublin, would hardly by itself have cleared away this element of myth or made 
me appreciate fully the sterling values that lay beneath that friendly and 
unassuming exterior. We talked very little about statistics during my stay, and 
the strongest impressions remaining are of a morning spent among the immense 
vats and varied smells of the brewery; of drives out of town on misty evenings 
through the badly lit Dublin suburbs in that old, high two-seater Model-T Ford 
of his, christened ‘‘The Flying Bedstead” ; of the warm hospitality of his fellow- 
brewers; and of a Saturday in the snow-covered Wicklow Mountains when, 
letting his folk go off to test the more exciting slopes, he patiently tried to teach 
me to ski on a short stretch of mountain road. 

My real understanding of Gosset as a statistician began, as no doubt for 
many others, when I joined that wide circle of his scientific correspondents. 
Perhaps to the majority of these he has stood as the friend who, with a greater 
mathematical knowledge, helped them to understand the statistical approach 
to experimental problems. In my own case the position was a little (lifferent, as 
his endeavour was always to temper my mathematical reasoning with sane 
common sense. I can think of no other statistician who would have shown that 
interest and forbearance over many years to a young man who was continually 
posting to him the results of half-finished investigations for comment and 
criticism. In looking back through this correspondence I realize more clearly 
now than I could ever have done at the time what its value to me has been, and 
I can see how many of his ideas scattered through these letters have since almost 
unconsciously become part of my own outlook. I think this must be true also in 
the case of other persons with whom he corresponded, so that one can say that 
the last thirty years’ progress in the theory and practice of mathematical 
statistics owes far more to “‘Student”’ than could be realized by a mere study of 
his published papers. 

One of the striking characteristics of these papers, also of course evident in 
correspondence, was the simplicity of the statistical technique he used. The 
mean, the standard deviation and the correlation coefficient were his chief tools; 
hardly adequate for treating specialized problems it might be thought; yet how 
extremely effective in fact in his skilled hands! There is one very simple and 
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illuminating theme which will be found to run as a keynote through much of his 
work, and may be expressed in the two formulae: * 

Perhaps we may count as one of his big achievements the demonstration in 
many fields of the meaning of that short equation (2); as he wrote in 1923 
(41, p. 273, but with modified notation): 

The art of designing all experiments lies even more in arranging matters so that p is as 
large as possible than in reducing 0 and o%. 

It is a simple idea, certainly, but I cannot doubt that its emphasis and 
amplification helped to open the way to all the modern developments of analysis 
of variance, and there may be some who have felt that where this technique runs 
a risk of defeating its ends by over-elaboration is just where that simple maxim 
has been set on one side. Recently I came across a short passage in a letter to 
a friend in Australia which refers to this theme and illustrates Gosset’s own 
humorously modest outlook on his own contributions. He had just received a 
good deal of criticism of a paper he read in March 1936 before the Industrial and 
Agricultural Research Section of the Royal Statistical Society (21), particularly 
because of his advocacy of the half-drill strip method of agricultural experiment. 
This is essentially a method of comparison whose efficiency depends on maxi- 
mizing correlation by taking the difference between the yields of neighbouring 
strips of the two varieties or treatments compared. He wrote: 


Meanwhile I...enclose the rough proof of what I said at the Statistical. You will 
gather from that that I am not in the fashion... .Some years ago an American referred to 
difference treatment as ‘‘Student’s’’ method and, though at the time I referred it to Noah, 
I am beginning to think that there is something in the name.t+ 

Anotiier point which must be borne in mind in gaining a real understanding 
of Gosset’s character and outlook is that all his most important statistical work 
was undertaken in order to throw light on problems which arose in the analysis 
of data connected in some way with the brewery. The subject of statistics was 
in no sense a whole-time job for him, nor, on the other hand, was it his hobby 
as it might perhaps be described in the case of W. F. Sheppard; he undertook 
theoretical investigations only when he or his colleagues were faced with 
difficulties which needed solution along statistical lines. Rapid if less accurate 
methods appealed to him because in much heavy routine work it was a question 
of finding such methods or of making no attempt at statistical treatment. He 
was in no hurry to see his results in print, and several of his papers in Biometrika 
were written in response to an editorial request rather than on his own initiative. 
In two cases at least, which I shall refer to below, he was using methods in the 
brewery ten years before publication was undertaken. He was indeed the ideal 


* o,, Cy, %,, and o,_, are the standard deviations of x, of y, of x+y and of x—y respectively, 
and p is the coefficient of correlation between x and y. + See (14, p. 709). 
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servant of his firm, and part of the value of his life’s work would need to be 
recorded in a history of progress gained by scientific research in industry rather 
than in the pages of Biometrika. 

Yet in spite of the fact that only a small part of his time was taken up with 
statistics, Gosset had a wonderful power of “getting there first” before the 
more professional statisticians. Perhaps it was because his greater detachment 
meant a continual freshness of mind. It is this characteristic, as well as those 
others I have mentioned, that I shall try to bring out in my description of his 
work in the following pages. 


EARLY STATISTICAL INVESTIGATIONS 


Gosset became one of the brewers of Messrs Arthur Guinness Son and Co., 
Ltd., in 1899. The firm had shortly before initiated the policy of appointing to 
their staff scientists trained either at Oxford or Cambridge, and these young 
men found before them an almost unexplored field lying open to investigation. 
A great mass of data was available or could easily be collected which would 
throw light on the relations, hitherto undetermined or only guessed at in an 
empirical way, between the quality of the raw materials of beer, such as barley 
and hops, the conditions of production and the quality of the finished article. 
With keen minds playing round the interpretation of these data it was almost 
inevitable that before long the need was realized of some understanding of the 
theory of errors. No doubt during the first few years of his appointment 
Gosset was mainly occupied with learning the routine work of his job, but once 
this knowledge had been gained it was natural that he, as the most mathematical 
of the younger brewers, should give his attention to the question of error theory. 
He seems to have made use of the following books: G. B. Airy, Theory of Errors 
of Observations; Lupton, Notes on Observations; M. Merriman, The Method of 
Least Squares. 

By 1904 he had made himself sufficiently familiar with the subject to draw 
up a Report on “The Application of the ‘Law of Error’ to the work of the 
Brewery”. This document, dated 3 November 1904,* opens with some paragraphs 
which set out in simple terms a case for the introduction of statistical method in 
large-scale industry. They are worth quoting since they might be put before 
many a board of directors to-day with just as much cogency as they were put 
34 years ago in Dublin: 

The following report has been made in response to an increasing necessity to set an 
exact value on the results of our experiments, many of which lead to results which are 
probable but not certain. It is hoped that what follows may do something to help us in 
estimating the Degree of Probability of many of our results, and enable us to form a 


judgment of the number and nature of the fresh experiments necessary to establish or 
disprove various hypotheses which we are now entertaining. 


* IT am extremely grateful to the firm for giving me permission to see and quote from this and 
other records available in their Dublin brewery. 
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When a quantity is measured with all possible precision many times in succession, the 
figures expressing the results do not absolutely agree, and even when the average of results, 
which differ but little, is taken, we have no means of knowing that we have obtained an 
actually true result, and the limits of our powers are that we can place greater odds in our 
favour that the results obtained do not differ more than a certain amount from the truth. 

Results are only valuable when the amount by which they probably differ from the 


truth is so small as to be insignificant for the purposes of the experiment. What the odds 
should be depends: 


(1) On the degree of accuracy which the nature of the experiment allows, and 
(2) On the importance of the issues at stake. 


It may seem strange that reasoning of this nature has not been more widely made use of, 
but this is due: 


(1) To the popular dread of mathematical reasoning. . 


(2) To the fact that most methods employed in a Laboratory are capable of such refine- 
ment that the results are well within the accuracy required. 


Unfortunately, when working on the large scale, the interests are so great that more 
accuracy is required, and, in our particular case, the methods are not always capable of 


refinement. Hence the necessity of taking a number of inexact determinations and of 
calculating probabilities. 


The Report then introduces the error curve and discusses some of its 
properties. The curve is written in Airy’s form 


1 
y= — ere’, 


where c is the modulus. The method is given for estimating c from a sample of 
n observations, by calculating (a) the mean deviation, (b) the mean square 
deviation (dividing by n— 1), and using the appropriate correcting factors. It is 
stated that (b) gives a better value “in proportion 114/100”.* A numerical 
example is given and it is suggested that both methods (a) and (b) should be used 
to check one another. There is next some discussion given to what was then 
clearly a most important practical problem in the brewery: the size of sample 
needed to make the odds that the mean lay within desired limits sufficiently 
large. Chauvenet’s criterion for the rejection of extreme observations is quoted, 
as well as the modulus of the estimate of c (obtained by the mean square process), 
namely c/,/(2n). 

All this is simply Airy or Merriman put by Gosset into the form most useful 
for his fellow brewers. What, however, shows a flash of his own insight is the 
use which he makes of Airy’s theorems on the “Error of the result of the 
addition (or subtraction) of fallible measures”. Thus if 


W=X+YV+Z+ ete., 


* This is the ratio of the sampling variances of (a) the mean deviation, and (6) root mean 
square deviation estimates of c, in large samples. I do not know from what source Gosset obtained 
these figures. The full value of the standard error of the mean deviation for samples of any size 


from a normal population was first derived, I believe, by Helmert (1876), but Gosset could not have 
known of this paper. 
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and £, e, f, g, etc., are the probable errors (or alternatively the moduli or the 
mean errors) of W, X, Y, Z, ... respectively, then Airy gives the law 

Gosset had noticed in certain cases he had met with that the result H? = e? +f? 
did not hold, as it should according to this law, for both W=X+Y and 
W = X-—Y. In other words he found that if W, X and Y are measured from 
their means there was very considerable difference between Sum (X + Y)? and 
Sum (X — Y)?. He concluded that when this was the case it was a sign of the 
existence of a correlation between the variables. Thus he was feeling his way 
towards the fundamental relations (1) and (2) of p. 212 above, but he had not 
yet been introduced to the correlation coefficient. 

The concluding remarks of the Report are interesting : 

We may point out that, although the proof of the law (of Error) rests on higher 
mathematics, the application of it only demands quite simple algebra. We have been met 
with the difficulty that none of our books mention the odds, which are conveniently accepted 


as being sufficient to establish any conclusion, and it might be of assistance to us to consult 
some mathematical physicist on the matter. 


This last difficulty was repeated in the summary which contains the sentence: 


Explains that we have no information of the degree of probability to be accepted as 
proving various propositions, and suggests referring this question to a mathematician. 

It is curious perhaps that Gosset should have felt at first that a mathe- 
matician was needed to solve this particular problem, which is just the point 
which the mathematician would now consider that the practical man must 
answer.* As we shall see in a moment he changed his view, but it seems to 
have been uncertainty on this question which led almost at once to that 
important contact between Gosset and Karl Pearson. A minute of March 1905 
added tu the printed Report indicates that arrangements for this meeting are to 
be made. 

The interview was arranged through Vernon Harcourt, a chemistry don at 
Oxford whose pupil Gosset may have been and who perhaps got into touch with 
Pearson through Weldon, who was then Professor of Comparative Anatomy at 
Oxford. The opportunity for a meeting came about 12 July 1905 when Pearson 


was spending his long vacation at East Isley in Berkshire and Gosset bicycled’ 


over from his father’s house at Watlington, preceded by a list of questions from 
which the following paragraphs are taken: 


(1) My original question and its modified form. When I first reported on the subject, 
I thought that perhaps there might be some degree of probability which is conventionally 
treated as sufficient in such work as ours and I advised that some outside authority should 
be consulted as to what certainty is required to aim at in large scale work. However it 
would appear that in such work as ours the degree of certainty to be aimed at must depend 


* T have, however, heard of another very recent case where an industrialist considered that it 
was the mathematical statistician’s job to suggest the appropriate odds to use. 
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on the pecuniary advantage to be gained by following the result of the experiment, 
compared with the increased cost of the new method, if any, and the cost of each experi- 
ment. This is one of the points on which I should like advice. 


(2) Another problem. I find out the P.£. of a certain laboratory analysis from n analyses 
of the same sample. This gives me a value of the p.£. which itself has a P.E. of P.E./,/2n. 
I now have another sample analysed and wish to assign limits within which it is a given 
probability that the truth must lie. E.g. if n were infinite, I could say “it is 10: 1 that the 
truth lies within 2-6 of the result of the analysis”. As however n is finite and in some cases 
not very large, it is clear that I must enlarge my limits, but I do not know by how much. 


(3) What is the right way to establish a relationship between sets of observations? I use the 
following method when endeavouring to establish a relationship between sets of observa- 
tions, but I have reason to suppose that it is not a good way and would like criticism on my 
method and advice as to the proper way. Suppose observations A and B taken daily of two 
phenomena which are supposed to be connected. Let A,, Ay, A3, etc. be the daily A observa- 
tions and let B,, B,, B;, etc. be the daily B observations. (I reduce the B observations if 
necessary or increase them by multiplying by a constant so that the p.z. of the A and B is 
about the same.) Then I form two series A, + B,, A,+ B,, ete. and A,—B,, A,— By, etc. and 
find the p.x. of each of the new series. If they are markedly different, it is clear (sufficient 
observations being taken) that the original series A and B are connected and proceed to 
attempt to find it quantitively. I cannot however at present find the P.£. of my results, 
nor can I be quite sure how great a difference between the P.z.’s of te sum and difference 
series is necessary to shew the connection. 


(4) What books would be useful? When you talk with me you will doubtless find out 
many other points on which I require enlightenment and could perhaps recommend me 
some books on the subject. 


The solution of “another problem” was to be given 2} years later in Gosset’s 
paper on “‘The probable error of a mean” (2). The method described in 
paragraph (3) is interesting. I do not know exactly how Gosset attempted to 
measure the relationship quantitively, but if, as would seem natural, he 
compared the difference between 2(A +B)? and 2(A — 8B)? with their average, 
then by adjusting the scale so as to make the P.z.’s of A and B approximately 
the same, he had secured a maximum value for this ratio, and therefore 
presumably minimized the risk of overlooking a relationship. For 


on 
+ 0% ’ 


which, for a given value of r4,z, has a maximum value of 274, when o,=0¢;. 
One feels that, given a little more time, with his unerring instinct for reaching 
the best solution, Gosset would have found for himself Galton’s correlation 
coefficient, just as he was later to rediscover Poisson’s limit to the binomial and 
Helmert’s distribution of a squared standard deviation. 

Among Pearson’s rough jottings written down for Gosset at the interview is 
the basic formula that he needed, 


= 0% +03, + 


(with the letter r doubly underlined), the probable error formula for r and also 
references to a number of papers on the theory of statistics. 
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Gosset was a quick learner; the immediate results of this visit include a 
Supplement to the brewery Repori of 1904, from which I have quoted, and a 
second Report on correlation dated 30 August 1905. In both of these the 
influence of new ideas received from Pearson is evident. The Supplement con- 
tains a warning that distributions may not always be normal, although in 
small sample problems “‘it is practically convenient to use a curve. ..which 
has been thoroughly investigated, of which the values have been tabled, and 
which in the majority of cases describes them ‘within the error of random 
sampling’”. His colleagues are also advised to use the standard deviation and 
not the mean error. The Report is headed “‘The Pearson Co-efficient of Correla- 
tion”, and describes, with a numerical example, the method of calculating this 
coefficient, r, as well as the use of the regression straight line for prediction. 

This idea of correlation, which in origin is of course Galton’s rather than 
Pearson’s, has more than once during the past fifty years brought with it a 
stimulus leading to fresh discovery. The conception, presented with all its 
novelty to minds which had hitherto only considered the perfect relationship of 
the physicist as a relationship which could be scientifically handled, has seemed 
to provide a key to the solution of a host of problems. The inspiration which 
Galton’s discussion of correlation in his Natural Inheritance gave to Weldon and 
Pearson in the early nineties has often been referred to and, now, the introduction 
of the new ideas opened out fresh avenues of research to both Gosset and his 
colleagues. The crude method which Gosset had invented of examining the 
difference between 2(A +B)? and 2(A— SB)? could be abandoned. It became 
possible to assess with precision the relative importance of the many factors 
influencing quality at different stages in the complicated process of brewing, and 
before long the methods of partial and multiple correlation were mastered and 
applied.* The Reports circulated within the brewery constantly quote correlation 
coefficients and their probable errors, while Gosset’s rough notebooks of this 
date contain numerous correlation tables. Apart from the actual calculation 
of r, the idea of arranging data in a two-way table was possibly novel and 
certainly illuminating to the brewers. 

It seems, however, to have been at once obvious to Gosset that the methods 
developed by Pearson and his co-workers for handling the large samples met with 
in biometric inquiries would probably need modification when applied to the 
problems of the brewery. In his Report on correlation of August 1905 he notes 
that “‘correlation coefficients are usually calculated from large numbers of cases, 
in fact I have only found one paper in Biometrika of which the cases are as few 
in number as those at which I have been working lately”. He was dealing at 
this time with all the possible correlations between a number of characters for 
which 31 observations were available; in another problem only 10 observations 


* A Report of Gosset’s of June 1907 applies multiple correlation to prediction. The mathematical 
Appendix dated 27 September 1906 is stated to have been read through by Karl Pearson. 
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could be used. He gives a reason which, though faulty, is extremely interesting, 
for doubting the validity of the probable error formula for r in small samples. Thus, 
if r is an observed correlation from a sample of n individuals, he takes the ratio 


Deviation of r from zero _ r : 
Probable error ofr (6) 


as a measure of the significance of the correlation, remarking that if the ratio is 
greater than 2} the odds are about 20: 1 on the existence of a real relationship. 
He then says that if n be very small “‘I expect a larger ratio is required’’, and 
illustrates this by supposing that r=0-9, n=4, when the probable error calcu- 
lated as in (6) becomes 0-064 and the ratio is 14. “‘Yet’’, he remarks, ‘“‘no one 
would claim any certainty from four experiments.” 

If we are asking whether an observed r is consistent with sampling from a 
population in which the correlation, say p, is zero, then the appropriate probable 
error is approximately 0-6745/,/n and not the value used in (6). Thus in Gosset’s 
example the ratio is really 2-7 and not 14; as he was afterwards to show, it was 
not the standard error that was seriously at fault in testing significance when 
dealing with small samples, but the assumption of normality. For n=4,p=0, the 
distribution of r is rectangular. The faulty reasoning involved in the inter- 
pretation of equation (6) has been used again and again in statistical literature ; 
the reason that in 1905 the difficulty had not caught the attention of the workers 
at the Biometric Laboratory was that they were dealing with large samples and, 
for these, the error involved is of relatively small consequence. It was Gosset, 
“naughtily” playing about with absurdly small numbers,* who stumbled on 
the inconsistency, although not at first understanding its reason. Here perhaps 
we may see the first illustration of the tremendous gain in clear thinking that 
has followed in statistics from an approach to the subject from the small-sample 
end. Also this is one of the many occasions on which Gosset was first on the spot. 

There were other difficulties in application that he was already turning over 
in his mind. For instance, he wished to obtain a combined measure of the 
correlation between two characters measured on several varieties of the barley 
used for malting and he considered the possibility of taking deviations from 
variety means. ‘I hope to find out the limitations of this device at some later 
date”, he reported. “‘I am using it and similar devices pretty freely... .” 

A point which may be of interest to industrial statisticians to-day is that the 
practical brewer of thirty years ago, as the practical engineer to-day, was 
objecting to the introduction into his reports of the statistician’s term popula- 
tion, yet was unable to suggest an appropriate substitute. A footnote to the 
word population ran as follows: “‘This appears to be a general statistical term to 


* Writing to Gosset on 17 September 1912 on the subject of the standard deviation, not corre- 
lation, Pearson remarked that it made little difference whether the sum of squares was divided by 
n or (n—1), “because only naughty brewers take m so small that the difference is not of the order 
of the probable error!” 
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express a number of things or people of the same kind. We have tried to find a 
word in common use to express this, but have failed.” 

The Report closes with a characteristic piece of sound advice: 

It must be borne in mind, however, that the better the instrument the greater the 
danger of using it unintelligently : it is more important than ever to think carefully in what 


way any connection may have arisen accidentally, and, more especially, any semi-constant 
variation must be treated with particular care. 


Statistical examination in each case may help much, but no statistical methods will 


ever replace thought as a way of avoiding pitfalls, though they may help us to bridge 
them. 


THE YEAR IN LonpDoN, 1906-7, AND THE WORK ON SMALL SAMPLES 


Following a general practice of the brewery, Gosset was sent away from 
Dublin for a year’s specialized study. He spent the greater part of this time 
either working at or in close contact with the Biometric Laboratory, where he 
arrived at the end of September 1906. During the year which had elapsed since 
he first met Karl Pearson he must have given a great deal of time and thought 
to the application of current statistical methods to the type of experimental and 
routine data analysed in the brewery. He was now anxious to obtain Pearson’s 
opinion on the work he had already done and to ask his advice on a number of 
unsolved questions. Probably he had already realized that the most important 
problem on which he required further information was the behaviour of 
frequency constants in small samples. In a letter written to a friend at the 
brewery on 30 September 1906, just after his arrival, he outlines, however, only 
a modest programme: 


Then he [K. P.] proposes to give me a room to work in, that I should attend his lectures, 
and become as far as possible accustomed to the calculations, etc., of his department. I had 
a long talk with him, and told him the lines I had been going on in the Hops. .., and he 
seemed to consider that I had been over most of the ground, but points soon cropped up 
which showed him the necessity for going deeper. I think that from what he said I am more 
or less on the right lines so far; perhaps when the reports have been considered you might 
let me have a copy of each of them, to ask about anything which may have occurred to me 
by then about them. I think he would be very willing to give us advice on any points which 
crop up. 

The first problem which he took up was of considerable practical importance 
in one department of the brewery activities: the question of the sampling error 
involved in counting yeast cells with a haemacytometer. In his paper (1) 
published early in 1907 he derived afresh Poisson’s limit to the binomial 
distribution, namely, m? 


and showed by a comparison of the series with four sets of experimental results 
that it did represent well the observed distribution of cell counts in an investi- 
gation carried out under carefully controlled conditions. The paper should be 


read in conjunction with another that he wrote on the same subject twelve years 
later (9). 
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The derivation of the limiting form of the binomial was not in itself an 
achievement of any special difficulty ; the series has been obtained independently 
from time to time by a number of investigators. But it was characteristic of 
*‘Student’s” flair or, as he himself would have said, luck that when he had a 
practical problem to solve he should go straight to the correct solution ; and that 
because it was a fundamental type of biological problem his research should have 
been of much greater value in the field of applied statistics than von Bortkie- 
wicz’s work, illustrated by fitting the Poisson series to suicides of German women 
and deaths of Prussian soldiers from the kicks of a horse. 

I have reproduced in facsimile in Plate II two pages from Gosset’s notebook 
containing the rough working for this paper. The experimental data are those 
of the series IV (see his p. 357). They are quoted also as an example of a 
Poisson distribution by R. A. Fisher in Statistical Methods for Research Workers 
(1938, p. 58). The left-hand page contains the 400 individual yeast cell counts 
and the resulting frequency distribution and histogram; the right-hand page 
shows the calculation of the mean, m, as well as the theoretical series and the 
derivation of x?. The expression N/,/(27mq) (or N/,/(27nq)), where q is put equal 
to unity, is an approximation to the frequency in the group containing the mean. 
Inthe notes, Gosset seems to have reached this result by a rather lengthy method, 
but it can be obtained by putting r =m in the general term of series (7) and using 
the first order term in Stirling’s approximation to m! No reference to this com- 
parison was made in the published paper. A few figures, which are in pencil in the 
note-book, appear to be in Pearson’s hand; e.g. the theoretical frequencies 3-712, 
17-37 and 40-65 as well as the three terms of the Poisson series at the bottom of 
the page. They were jottings made no doubt by K. P. on one of his daily 
“rounds” of the laboratory. 

A good part of the work on Gosset’s second paper on ‘‘The probable error of 
a mean” (2) was also carried out during his year in England; with it is closely 
associated his third paper on the “ Probable error of a correlation coefficient ”’ (3), 
as both were supported by the same piece of experimental sampling. I have 
already referred to Gosset’s doubts regarding the distribution of r in small 
samples ; since in the brewery work a mean value had often to be estimated from 
eight or ten determinations he also felt uneasy about the applicability to such 
work of accepted theory regarding the distribution of the mean and the standard 
deviation. A letter written on 12 May 1907 to a colleague in Dublin shows him 
to be in the middle of his investigation. After dealing with some points about the 
significance of differences* he adds: 


Herewith my answer to your questions. I hope it is quite clear, but I am afraid I rather 
increase the difficulties when I try to explain anything as a rule. 


* There is a reference to that long-standing difference of opinion regarding n and n—1, in the 
following sentence: ‘‘When you only have quite small numbers, I think the formula we used to use 
for the p.B. (./{2(2x*)/(n —1)} x 0-6745) is better, but if n be greater than 10 the difference is too small 
to be worth taking the extra trouble.” Here K. P. and Airy were in disagreement. 
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Page from Gosset’s notebook containing the analysis of haemacytometer counts. (Left-hand page.) 
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What I have written on the back is true for large samples, and approximately so for 


small, and is the accepted theory. My work on small numbers may or may not modify it. 
We shall know later... . 


I go up to K.P.’s lectures from here [The Ousels, Tunbridge Wells] ard on other days 
work at small numbers: a greater toil than I had expected, but I think absolutely necessary 
if the Brewery is to get all the possible benefit from statistical processes. 

There could be no better illustration than these last sentences of the way in 
which Gosset’s best work was called forth in the service of his firm. 

The contents of the paper on the probable error of the mean are too well 
known to require more than a brief summary. Starting with a sample of n 
observations, 2X, X, ..., X,, from a normal population with standard deviation 7 
and mean at the origin for x, Gosset obtained the sampling moments of 
s* = 2(x—%)*/n, where X is the sample mean. He showed that these moments 
corresponded exactly with those of a Pearson type III curve and hence inferred 


that the curve representing the sampling distribution of s* must almost certainly 


be y = constant x (8)* 


He then showed that the correlation coefficient between %* and s? was zero and, 
making the assumption (which does not necessarily follow though in fact it is 
true in this case) that this meant that % and s were absolutely independent, 
he deduced the probability distribution of z=Z/s as 


p(z) = constant x (1 +22)-*". 


He considered the properties of this curve,} gave a table of its probability 
integral for n=4 to 10 and examined its approach to a normal curve with 
standard deviation 1/,/(n—3). He next compared the distributions (8) and (9) 
with the results of a sampling experiment for the case n = 4 and finally illustrated 
the use of his results on four examples. 

When two years ago the question of the photographic reissue of the paper 
had been suggested to meet a continued demand for offprints, Gosset wrote to 
me describing it as now “‘rather a museum piece”. That is true, though perhaps 
in a different sense than he meant. It is a paper to which I think all research 
students in statistics might well be directed, particularly before they attempt 
to put together their own first paper. The actual derivation of the distributions 
of s? and 2, or of ¢=./(n— 1) z in to-day’s terminology, has long since been made 
simpler and more precise; this analytical treatment need not be examined 
carefully, but there is something in the arrangement and execution of the paper 
which will always repay study. 

In the first place, in the Introduction and Conclusions we find an excellent 
illustration of Gosset’s wise advice given to a beginner in the art of composition: 
‘First say what you are going to say, then say it and finally end by saying that 


* That this result had previously been derived by Helmert (1876), English-speaking statisticians 
were quite unaware till many years later. 
+ There are some minor errors in §§ Iv and v of the paper. 
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you have said it.”* The main part of the paper, the “‘saying it”, is divided 
clearly into headed sections. The adequacy of the assumptions on which the 
mathematical theory rests is tested by a piece of experimental sampling; this 
test being satisfactorily passed, computed tables required for application are 
given and finally a number of well-chosen examples illustrate the purpose of the 
inquiry. 

Before considering some other notable features of the paper and attempting 
to assess its influence on later work, it is important to see just what was the 
main purpose of the inquiry that its author had in mind. As usual with him, 
this was simple and practical. Having n observations, he wished to know within 
what limits the mean of the sampled population—the “‘true result” of the 1904 
Report—probably lay. His solution involved a tacit introduction of the method 
of inverse probability, but I do not think he ever tried to put this into precise 
terms.+ Thus the last sentence on the first page of the paper runs as follows: 

The usual method of determining the probability that the mean of the population lies 
within a given distance of the mean of the sample, is to assume a normal distribution 
about the mean of the sample with a standard deviation equal to s//n, where s is the 
standard deviation of the sample, and to use the tables of the probability integral. 

The results of the present investigation meant to Gosset that he could now 
assume in small samples a z-distribution for the population mean about the 
sample mean, the scale now being the sample standard deviation, s. In his 
examples he uses the z tables, not to test the hypothesis that the population 
mean is zero or has some other specified value, but to find the odds that this 
mean lies within specified limits, e.g. between 0 and 00, that is to say is positive. 
Take for instance his Illustration 1 (pp. 20-1); the average number of hours of 
sleep gained by ten patients treated with D. hyoscyamine hydrobromide is 
%=0-°75 while the standard deviation is s=1-70. If we regard the population 
mean, say £, to be distributed about the sample mean 0-75 in the z-form, with 
a standard deviation of s, it follows that the chance that £ > 0 is the proportionate 
area under the z-curve between the ordinate at 


0—0-75 
=—__ = — 0-44 
1-70 


and oo. This is the same as the chance that z< + 0-44, which interpolation in his 
tables in the column x = 10 shows to be 0-887. He therefore argued that the odds 
are 0-887 to 0-113 that the population mean £ is positive, i.e. that the soporific will 


* The advice was not originally Gosset’s. Writing in 1934 he says: “This is a rule which we owe 
to A. J. (I think at second hand).” He then quotes the rule and adds, “It does make things so much 
easier for everybody concerned, besides which ‘what I tell you three times is true’” ; the last words 
are those of the Bellman in The Hunting of the Snark. 

+ In his paper on the correlation coefficient written in the same year (3, p. 302) Gosset states 
definitely that a knowledge of the a priori probability distribution of the population correlation 


coefficient, R, is needed in order to determine “the probability that R...shall lie between any 
given limits”’. 
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on the average give an increase of sleep. While a somewhat loosely defined concep- 
tion of inverse probability seems to underlie the argument, it will be seen that as 
far as the practical consequences go, Gosset had reached a result which we can 
hardly improve on 30 years later. It is true that, using the idea of the fiducial or 
confidence interval, some of us would word our statement of limits and proba- 
bilities a little differently so as to avoid any appeal to inverse probability, but as 
practical statisticians we must, I think, admit that our conclusions would be 
identical. 

There are some other features of the paper which are interesting historically. 
Gosset remarks on p. 13 that before he succeeded in solving the problem analy- 
tically, he had endeavoured to do so empirically. The sampling experiment 
which he carried out for this purpose involved the drawing of 750 samples of 4 
by means of shuffled slips of cardboard, from W. R. Macdonell’s (1901) correla- 
tion table containing the distribution of height and middle-finger length of 3000 
criminals. As far as I know this was the first instance in statistical research of 
the random sampling experiment which since has become a common and useful 
feature in a large number of investigations where precise analysis has failed. 
The results of this same experiment were used by Gosset in a number of later 
papers. On p. 16 he draws attention to a difficulty in the application of 
Pearson’s y?-test of goodness of fit which was later to lead to R. A. Fisher’s 
modification in terms of degrees of freedom. On p. 19 he gives reasons for 
believing that even when the population sampled is not normal the sampling 
distribution of z will be very little modified; this was a prediction which 
experimental and theoretical investigations carried out in recent years have 
confirmed. 

Finally we may note the introduction of a difference in notation to dis- 
tinguish between sample and population characters, viz. s for the sample and o 
for the population standard deviation. The need for this distinction seeras 
obvious to us to-day, but it is interesting to notice that it was only when 
attention was directed to the problem of small samples that statisticians grasped 
the clarification resulting from this innovation. 

As the theory of mathematical statistics has developed, the significance of 
“Student’s” test has been elaborated from many angles and deeper meanings 
associated with it than its author had ever dreamed of. This is a common 
feature of scientific progress, but as Neyman very appropriately remarked on a 
recent occasion (1937, p. 142): ‘The role of a rigorous scientific theory is fre- 
quently very modest and is reduced to explaining to the practical man—and 
this sometimes with a certain difficulty—how good is what he himself knew to 
be good long ago.”” To understand the reason for the historical importance that 
has rightly been associated with this paper, it is not however necessary to discuss 
the abstract conceptions of the mathematical statistician and their relation to 
forms of critical regions in hyperspace; it can be explained much more simply 
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than that. As Gosset wrote on the second page of the paper, referring to the 
inadequacy for certain purposes of the statistical technique available in 1908: 

There are other experiments, however, which cannot easily be repeated very often; in 
such cases it is sometimes necessary to judge of the certainty of the results from a very 
small sample, which itself affords the only indication of the variability. Some chemical, 


many biological, and most agricultural and large scale experiments belong to this class, 
which has hitherto been almost outside the range of statistical inquiry. 


It is probably true to say that this investigation published in 1908 has done 
more than any other single paper to bring these subjects within the range of 
statistical inquiry ; as it stands it has provided an essential tool for the practical 
worker, while on the theoretical side it has proved to contain the seed of new 
ideas which have since grown and multiplied an hundredfold. 

The sampling experiment used to test the accuracy of the theoretical distri- 
butions of s? and z was also planned to throw light on the distribution of the 
correlation coefficient 7, in very small samples. In this second problem (3) 
Gosset was forced to rely much more on his empirical approach than before, 
since the mathematical solution lay beyond his powers. In suggesting the 
probable form of the distribution of r when sampling from a population in which 
the two variables were uncorrelated (i.e. R = 0)* he could get no clue from known 
values of moments as in the case of s?. He started from the following basis: 
(a) the distributions must be symmetrical about 7 =0 and be limited within the 
range —1 to +1; (6) he had available the distributions of r found from his 
experiment for 745 samples of 4 and 750 samples of 8; (c) of these, he noticed 
that the former was approximately rectangular. 

As in the case of s?, his training at the Biometric Laboratory naturally 
suggested that he should try to use a Pearson curve for the unknown distribution ; 
a type IT curve was the only one suitable, and therefore in his own simply 
expressed phrase, ‘““working from y = y,(1 —2?)® for samples of 4 I guessed the 
formula (10) 


He then showed that for x = 8 this formula represented his empirical sampling 
distribution very well, and pointed out that the result agreed with large sample 
theory, since the standard deviation o,=1/,/(n—1) would equal Pearson and 
Filon’s value of (1—R?)/,/n when R=0 and n +o. He also gave the correct 
limiting result, which he had been able to establish for any R, when n=2, 
suggesting that this might furnish a clue for the distribution when n> 2. It was 
a brilliant piece of guessing and all the more striking because of the forceful way 
in which the supporting evidence was marshalled. 

In the case where the population correlation, R, was not zero Gosset provided 
three empirical sampling distributions for the cases R = 0-66 and n= 4, 8 and 30. 


* He used R for the population correlation; the notation, p, seems to have been first used by 
H. E. Soper (1913). 
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Distribution of the correlation coefficient in samples of 4, tabled in Gosset’s notebook. 
Above, R=0-66; below, R=0. 
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He also set out very clearly the conditions which his work showed must be 
satisfied by the true distribution. “I hope”, he concluded, “they may serve as 
illustrations for the successful solver of the problem”. Six years later R. A. 
Fisher was able to demonstrate the substantiai accuracy of all Gosset’s predic- 
tions both in the r and the z paper. 

In the notebook containing the original samples of 4 from Macdonell’s 
correlation distribution, there are given what I think must be the original 
distributions built up by Gosset as he tabled his calculated values of r. Two of these 
are shown in facsimile in Plate III (n= 4, R=0-66 and n=4, R=0). It is hard to 
believe that Gosset did not experience a very pleasurable excitement as these 
distributions gradually took shape on the paper, for he was exploring a region 
entirely unmapped and the discovery of the rectangular distribution in the case 
when R=0 must have been a complete surprise.* 

One of the curious things that must strike us now about these two papers of 
Gosset’s (2, 3) is the small influence that their publication had for a number of 
years on current statistical literature and practice. The z-test was used in the 
brewery at once, but I think very little elsewhere for probably a dozen years. 
Perhaps because he realized that it showed how little reliability could be placed 
on a correlation coefficient based on small numbers, Gosset does not seem to 
have recommended the use of the 7-test even to his colleagues and he made no 
tables of the probability integral for the distribution (10). I have come across, 
however, one reference to the work in a letter of 3 April 1912 to E. S. Beaven, 
in which the following remarks occur: 

By the way, don’t be too cock-a-hoop about your 0-95 correlation with 7 cases. Such a 


thing might occur more than once in a hundred trials of 7 cases, even if there were no 
correlation. (I haven’t got tables to evaluate 


/ | "cost a6, 
sin—"0-95 J0 


but you get that fraction of N at each end 0-95 or over in N trials); and I guess its about 
2% at each end.+ All the same it seems very reasonable to suppose that it is right. 

From Gosset’s point of view, he had developed the tools which he needed 
for practical application in Dublin and he was not primarily interested in their 
wider use. If Pearson failed to realize the importance of the work and did not 
assimilate the results into current practice and teaching, it was because he too 
was mainly interested in what appeared to be of value in the research investiga- 
tions of his laboratories. To him all small sample work was dangerous and 
should be avoided. But it would be wrong to suppose that there was a lack of 
sympathy between the two; except at a far later stage when opposite views over 
z found their way into print, Pearson’s attitude towards Gosset’s small sample 


* Yet Mrs Gosset, who was helping him at the time, writes: “‘Whatever thrill he may have got 
out of that experiment he showed nothing whatever of it, anc his amanuensis never realized that 
there was anything original about it!” 

+ Gosset was wrong here. The fraction is actually 0-001. 
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work was one of humorous protest, well conveyed in the quotation I have given 
about “naughty brewers” who take n too small (p. 218 above). The readiness 
with which he would talk to Gosset over his problems and at times refer to him 
on matters of difficulty shows how highly he rated his ability and insight. 
Although Gosset launched off along independent lines of investigation directly 
he had mastered the elements of statistical theory, it is clear that he owed a 
great deal to the early guidance that he received in London. In the first place 
he had that very great advantage of being freed for a year from his official duties 
and of spending that time in close contact with persons who were enthusiasts in 
the study of statistics. Although, as he wrote at a later date, ““I am bound to 
say that I did not learn very much from his [K. P.’s] lectures; I never did from 
anyone’s and my mathematics were inadequate for the task”, he obtained from 
the Biometric Laboratory a number of things which were not to be found in 
Airy or Merriman: the theory of correlation, the ?-test, and above all Pearson’s 
system of frequency curves. It is doubtful for instance if he could have reached 
the distribution of s?, and hence that of z, if he had not had available for use 
Pearson’s type III curve. 

After his year in London was over Gosset kept in close touch with Pearson 
for 29 years, and to his intimate friends would speak with admiration of his 
teacher. Some sentences which he spoke at the opening meeting of the Industrial 
and Agricultural Research Section of the Roya! Statistical Society in November 
1933 were composed, I know, with this aspect of the relationship between 
professor and student in mind: 


Another point arises from the peculiar nature of statistics. It is impossible to apply 
statistical methods to industry or anything else unless one has a certain amount of intelligent 
experience as a background. That works both ways. The practical man has to go and 
talk to his Professor partly in order that the Professor himself should share his experience. 
...-The whole art of statistical inference lies in the reconciliation of random mathematics 
with biassed samples. Every new problem has some fresh kind of bias and might contain 
some new pitfall. The only way not to fall into these pitfalls is to talk over the problem 
with some intellir ont critic; and so the practical man, if he is not entirely foolish, talks over 
his problems with the Professor, and the Professor does not consider himself to be a com- 
petent critic unless he has had some experience of applying the statistics to industry and 
has learned the difficulties of that application. 


MISCELLANEOUS PAPERS, 1909-21 


Before considering the very important part that Gosset played in the 
development of agricultural experimentation, it is desirable to give a brief 
account of six papers on a variety of subjects which were published in Biometrika 
between 1909 and 1921. 

(i) The first of these papers on ‘‘ The distribution of the means of samples which 
are not drawn at random” (4, 1909) dealt with one aspect of that theme which, 
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as I have already mentioned, runs through so much of his work. He had realized 
at an early date how frequently there existed a correlation between successive 
observations either in time or space. Thus if 2 and y are two contiguous 
observations it would follow that 

> 
Hence if 2 and y were successive duplicate chemical analyses of the same 
quantity their mean would be less reliable than we should expect on the usual 
theory of random sampling. On the other hand were x and y the yields from 
plots of two different cereals which were to be compared, by placing the plots 
side by side in space, the difference x—y would be more reliable than on the 
classical error theory. In this paper he considers the distribution of the mean 
not of two but of % ~* -srvations, so selected that they are correlated, i.e. more 


like one another t duals randomly selected from the population. It is 
the problem of fratexn which Pearson had termed homotyposis in his 
biometric work. Gosset, the second, third and fourth moments of the sample 


mean, the second having une value 


M, =~ {1+ (n—1)p}, (11) 


where o is the population standard deviation of x and p the correlation between 
the x’s in a sample, which Fisher has termed the intraclass correlation. From the 
values of the third and fourth moments he deduced that in general it was likely 
that the distribution of the mean would tend to normality less rapidly than when 
p=0. 

From the practical point of view he was concerned to warn the chemist that 
‘‘repetition of analyses in a technical laboratory should never follow one another, 
but an interval of at least a day should occur between them. Otherwise a 
spurious accuracy will be obtained which greatly reduces the value of the 
analyses ’”’. 

(ii) The next paper (6) published in 1913 dealt with “The correction to be 
made to the correlation ratio for grouping’, an investigation no doubt connected 
with Pearson’s work (1913) on the same subject published in the same number 
of Biometrika. 

(iii) Volume x of Biometrika (1914) contains a short note on “‘The elimina- 
tion of spurious correlation due to position in time or space” (7). In this, Gosset 
showed that the difference correlation method used by F. E. Cave (1904) and 
R. H. Hooker (1905) could be extended to differences of higher order than the 
first. This paper was the basis of later investigations on the variate difference 
correlation method. 

(iv) In 1917 (8) Gosset published an extension of his tables of the probability 
integral of z; the range covered now ran from n=2 to n=30. In the intro- 
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ductory remarks he again gave advice “as to the best way of judging the 
accuracy of physical or chemical determinations”. He wrote: 

After considerable experience, I have not encountered any determination which is not 
influenced by the date on which it is made; from this it follows that a number of determina- 
tions of the same thing made on the same day are likely to lie more closely together than if 
the repetitions had been made on different days. It also follows that if the probable error 
is calculated from a number of observations made close together in point of time much of 
the secular error will be left out and for general use the probable error will be too small. 
Where then the materials are sufficiently stable, it is well to run a number of determinations 
on the same material through any series of routine determinations which have to be made, 
spreading them over the whole period. 

(v) Gosset’s paper of 1919 (9) on “‘An explanation of deviations from 
Poisson’s law in practice”’ answered some questions regarding the relation of this 
series to the positive and the negative binomial raised by Lucy Whittaker (1914) 
in a paper published five years earlier from the Biometric Laboratory. Since 
the rather severe criticisms of the latter paper directed against the applications 
of the Poisson law made by Bortkiewicz and Mortara might have discouraged its 
use in other directions, Gosset pointed out that the object of his own earlier 
paper (1) was to give the user of the haemacytometer a guide to the error of his 
count. From this first practical point of view it made little difference whether, 
theoretically, the better fitting distribution was a positive or negative binomial, 
although as a further point it was of interest to consider what such departures 
implied if the data were sufficient to establish them. 

(vi) The final paper (10) of this group on ‘‘ An experimental determination of 
the probable error of Dr Spearman’s correlation coefficients”, was written in the 
first instance for reading at one of the early meetings (13 December 1920) of the 
newly formed Society of Biometricians and Mathematical Statisticians. Gosset 
had many years before realized the value of the method of rank correlation in 
assessing quickly the order of relationship between two short series of numbers. 
Probably while working at the Biometric Laboratory he had developed the proof 
quoted by Pearson (1907, p. 13), that the standard error of the coefficient 

p= (12) 
— 1) 
is 1/,/(n—1), in the case of independence in the population. In a Report written 
in 1911 for his coileagues in the brewery he illustrated the use of the method and 
gives what is substantially the correction for “‘ties’’ described in the present 
paper of 1921. Apart from the publication of this correction, the paper is of 
interest because Gosset again made use of his sampling experiment of 1907. 
For the 375 samples of 8 from a population having correlation 0-66 he calculated 
both of Spearman’s rank correlation coefiicients, in their raw and corrected 
form and, in the case of his 100 samples of 30 added Sheppard’s estimate of 
correlation obtained from a median fourfold division. He uses these results to 
make a number of comparisons between the methods, in particular paying 
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regard to the amount of additional sampling needed if one of these more rapid 
methods of “‘assay”’ is to give as reliable an estimate of the population correla- 
tion coefficient as that obtained from the usual product-moment formula. He 
concludes by suggesting to mathematicians a problem which has still remained 
unsolved, that of determining the sampling distribution of the rank coefficient 
of equation (12) above, in random samples from a bivariate normal population, 
in which the correlation is not zero. 


THE APPLICATION OF STATISTICAL METHOD TO AGRICULTURAL PLOT 
EXPERIMENTS 


It is a feature commonly noticeable in the advance along any new line of 
scientific inquiry that the first steps in that progress are made hesitatingly and 
with difficulty, accompanied by much trial and error; and then after many years 
of what seems, looking back, to have been a painfully slow advance to an 
obvious goal, a stage is reached where the way forward has been almost cleared 
so that the introduction, perhaps, of some new tool or some fresh personality 
leads to a rapid advance into fresh country. In later years the casual student 
may well attribute the beginning of an epoch to that moment of rapid advance, 
partly because few records of the earlier struggle have found their way into 
print and partly because the later workers themselves have hardly realized the 
amount of thought that has gone into the creation of ideas which have formed 
the groundwork of their own further progress. 

The history of the introduction of statistical methods in the planning and 
interpretation of agricultural experiments provides an illustration of these 
points. The large extension of technique with the accompanying stimulus to 
scientific planning which followed R. A. Fisher’s introduction of the methods of 
analysis of variance in the years following 1923, may have caused the present- 
day statistician to overlook the essential pioneer work of the preceding years, 
without which it is certain that the later advance would have been impossible.* 
It therefore seems appropriate to take this opportunity of giving rather special 
attention to this aspect of Gosset’s contribution to statistics and to do so by 
following out the gradual stages by which he advanced from simple beginnings 
to the analysis of a balanced block experiment. 

A number of persons contributed to this early work and, as is often the case 
when methods of attack are in an imperfect or trial stage, ideas were worked 
out in correspondence or by word of mouth rather than in print. The brewery, 
as a very large consumer of barley, was naturally interested in agricultural 
problems and in particular in certain large-scale experiments undertaken in 
Ireland under the supervision of the Irish Department of Agriculture. Gosset 
was not, however, concerned with giving advice in these experiments till a 


* Fisher himself has on many occasions paid a warm tribute to the help he received both from 
‘‘Student’s” published work and from correspondence and discussion. 
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number of years after he had specialized in statistics, and I think his first real 
interest in agricultural work arose from his contact with E. 8. Beaven, who as 
a maltster was from time to time in Dublin on official business. Beaven had 
started experimental work in the nineties and about 1905 approached Gosset for 
an interpretation of apparently anomalous results, afterwards seen to be due to 
interference, that he found in comparing the yields of two varieties of barley in 
his ‘cage’ at Warminster. From that date until Gosset’s death there was a 
continuous flow of correspondence between them in which ideas were exchanged 
and thrashed out, and the more mathematical approach of the younger man was 
influenced by the’ practical experience of his older friend. 

It will be noticed that three out of Gosset’s four illustrations in the paper on 
tue probable error of the mean (2) deal with agricultural topics; the data were 
taken from published accounts of Woburn farming experiments and Gosset 
shows how, by taking appropriate differences and using his z-test, a more precise 
interpretation of such results could be obtained than had hitherto seemed 
possible. Beaven was in touch with the agricultural work both at Rothamsted 
and Cambridge and it was no doubt owing to his report of Gosset’s keen 
interest in these problems that both of those classical papers by Wood & 
Stratton (1910) and by Mercer & Hall (1911), dealing with the analysis of what 
we now term uniformity trial data, passed through Gosset’s hands before 
publication. The first was only “‘an affair of a day or two’s glancing at” after 
which he ‘“‘made one or two suggestions, most of which were quite rightly turned 
down as being too refined for the purpose”’.* But in the second case he ‘‘ brooded 
over the paper for months”’, and made suggestions which were incorporated, as 
well as adding an Appendix (5). If we compare the two statistical contributions, 
that of Stratton to the first paper and that of Gosset to the second, it is possible, 
I think, to see without difficulty the latter’s special contribution to the subject. 
Stratton is following the approach of the classical theory of errors, which he had 
learnt and applied as an astronomer; he shows that variation in plot yields can 
be represented by the error curve and hence that the results of that theory 
regarding the probable error of a mean are applicable. These results are used to 
show the relation of size and number of plots (or animals) to the reliability of 
the results. No reference is made to ‘‘Student’s” paper of 1908. 

Gosset, writing his Appendix a year later, brings to the problem the added 
insight that he has gained from an understanding of correlation theory and 
from much discussion of the Warminster results with Beaven. He shows how it 
is possible to bring the changing fertility level or “‘patchiness” of the experi- 
mental field into service (a) by scattering the varieties to be compared in small 
plots over the field, and (6) then taking as the statistical variable for analysis the 
difference between the characteristics of two varieties on neighbouring plots. 
Thus the standard error, by way of formula (2), p. 212 above, can be very much 


* These quotations come from a letter of 4 June 1922 from Gosset to Beaven. 
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reduced. The illustration which he gives deals only with the case of two varieties 
A and B, and at this date he had probably not thought out a technique for dealing 
with more comparisons. 

There is another point of difference that may perhaps be noted; Wood and 
Stratton by raising the question, “‘What is the probable error of a single field 
experiment?” seemed to suggest that it might be possible to determine a 
single value, 7, which it would be appropriate to apply to future experiments of 


a given type. Gosset however emphasized a rather different idea. He writes 
(5, p. 130): 


But, it will be asked, why take all this trouble? The error of comparing plots of any 
given size has been found by the authors of the paper, and all that has to be done is to apply 
this knowledge to the particular set of experiments. 

The answer to this is that there is no such thing as the absolute error of a given size of 
plot. We may find out the order of it, be sure perhaps that it is not likely to be less than 
(say) 5 per cent. nor more than 15 per cent... . but the error of a given size of plot must vary 
with all the external conditions as well as with the particular crops upon which the experi- 
ment is being conducted, and it is far better to determine the error from the figures of the 
experiment itself; only so can proper confidence be placed in the result of the experiment.* 


His own 2-distribution was available, if the number of observations was 
scanty. 

If the field were divided into m pairs of plots and x; and y,; were the yield, 
say, of varieties A and B on contiguous ith plots, then Gosset’s test for a dif- 
ference in yield may be summarized as follows: 

Write d; = x;—y,; and d =Sd,/m. 


Calculate the ratio z= —— 


and if m<10 refer this to the 2-tables (2, p. 19). Otherwise, if m> 10, since z 
has a standard deviation of 1/,/(m—3), refer z ./(m-—3) to Sheppard’s tables of 
the normal probability integral. 

In the years 1912 and 1913 at Beaven’s suggestion plot experiments of 
similar design, each comparing eight varieties of barley, were carried out at 
three centres, viz. Warminster, Cambridge and Ballinacurra in Co. Cork. The 
experiments were carried out in cages, and there were twenty replications of 
each of the eight varieties in square-yard plots. The arrangement of the 
varieties in a ‘‘chess-board” pattern was effectively what we should now term 
balanced; a plan of one of the schemes has been shown in Gosset’s paper of 
1923 ‘‘On testing varieties of cereals’ (11, p. 277) and I have reproduced a por- 
tion of this below, only adding some thicker rules to separate the different sets 
of eight plots. 


Beaven suggested that the results might be analysed by using as a statistical 


* The italics are mine. 
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variable the difference between (1) the yield on a plot of A, say, and (2) the mean 
yield for the eight varieties (including A) on the 9-plot area in which this A-plot 
lay at the centre.* This was a rough and ready procedure but, as Gosset pointed 
out, owing to correlation there would be difficulty in the statistical interpreta- 
tion. The method which he preferred was a very natural extension of his 
difference method advocated in the case where there were only two varieties. 
He could still clearly use that method to compare any two of the eight varieties, 


E B G D A F 


D A F C H E 


G D A F Cc H 


A 
236-5 210-4 291-1 223-9 


« Fig. 1. 


say A and D, taking the corresponding pair of plots from each set of eight, and 
differencing the character measured, although the plots would not now be 
generally contiguous. This would mean that changes in soil fertility, etc. would 
make the comparison less accurate than before,t but that could not be helped 
if eight varieties were to be compared in a single experiment in place of two. 
He saw, however, that it was possible to compensate to some extent in another 
direction for this loss in accuracy, by getting a single combined estimate of 
error from all the 4n(n — 1) = 28 possible sets of differences between n = 8 varieties, 
a method which he described as ‘‘hotchpotching’’ the comparisons. The reason- 
ing which he used in reaching his result may be set out as follows: 

Let there be n varieties each repeated m times and denote by d,,,.; the 
difference obtained from the ith comparison of the uth and vth varieties 
(i=1, 2, ..., m) and by d,,, the mean of these m differences. Thus in Fig. 1, if 
u and v stand for varieties A and D, respectively, then 


= — 255-9 = —19-4, = 222°6 — 295-8 = — 73-2, ete. 


To obtain a common estimate of the standard deviation of differences, say o, 
proceed now, he argued, as follows: (1) calculate the 4n(n— 1) possible values of 
— (2) multiply each by a factor m/(m—1) so that its 


* One variety would appear twice in this mean and its yield must be suitably weighted. 
t Gosset at a later date made comments on this point and on the assumption involved in 
getting a pooled estimate of standard errors that might differ; see (41, pp. 285 and 282). 
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expectation becomes o?; (3) sum these quantities and divide by their number. 
Thus the final estimate of o? becomes 


2.  U,v t 


As I shall explain later, this is exactly the estimate which would now be 
used, only it would be calculated in a more direct manner. The division of 
Beaven’s plots into sets of eight which I have shown in Fig. 1, would to-day 
be termed a division into blocks (though the blocks are not similar in shape), 
and the arrangement of the different varieties within a block would be called 
balanced rather than random. Thus already in 1912 Beaven and Gosset together 
had gone a long way towards reaching one form of the present-day experimental 
technique. 

Having obtained the estimate s? of (14), Gosset was then able to consider the 


significance of the difference between any pair of varieties by calculating the 


ratio dy 
s 


and referring to Sheppard’s tables.* His method was to place the eight varieties 
in order of magnitude of the character under consideration and, by applying the 
test as a foot-rule to selected differences, draw reasoned conclusions as to the 
existence or absence of real variety differences. A test (R. A. Fisher’s 2-test) 
which would determine whether as a whole the eight variety means differed 
significantly would clearly have been useful, but sound common sense could make 
the difference test yield reliable results. 

This method was applied to the English and Irish chess-board results; the 
computation was lengthy and many pages of a large notebook of Gosset’s are 
filled with the calculations. G. U. Yule carried out the Cambridge computations 
in consultation with Gosset. But, however laborious the work, the conclusions 
obtained from the analysis combined with results of large scale tests played an 
important part in securing the steady improvement that was being effected in 
the quality of Irish grown barley. 

It is perhaps of historical interest to note a more general formula that 
Gosset was using at this time to obtain a common estimate of standard devia- 
tion from data classified into a number of groups with possibly different means. 
The formula would not now be regarded as satisfactory, but it illustrates well the 
slow progress of the human mind to its final goal. 

Suppose that N observations of a variable x are divided into » groups of 
unequal size, that x,; is the ith observation in the th group; further that m, is 


* The common estimate, s*, of (14) is based on so many observations that Gosset probably had 
not considered whether d,,,,/s could be referred to the z-distribution. 
+ I have taken the expression from a letter of 1912 to Beaven. 
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the number and %, the mean in that group. Then Gosset took as an estimate of 
a supposed common within-group variance, 0”, the expression 


1 n m m 


Since the expectation of }}(«,;—%,)? is (m,—1) 0? and N = Ym, it will be seen 
t 


that the expectation of s? is o?. Except in the case where m, is the same for 
every group, which was the case he was concerned with in the chess-board 
analysis, the factors weighting the sums of squares are not, however, those 
which we now know give an estimate of o? having minimum sampling error. 
When however m, = m his estimate assumed the correct form 


Had he applied formula (16) to the chess-board problem in a case where the 
number of plots was not the same for all varieties, his tinal estimate would have 
been less satisfactory. 

During the war period of 1914-19 the analysis of the chess-board results was 
discontinued. In 1920 Gosset took over responsibility for the statistical aspects 
of the barley experiments conducted at a number of centres by the Irish 
Department of Agriculture, and this made him particularly interested in the 
possibilities of Beaven’s new half drill strip method of arrangement. Corre- 
spondence with Beaven is full of discussion of the possibilities of this method 
and of the best way of analysing the results. At the same time he was in touch 
with R. A. Fisher who was beginning to turn his great mathematical powers to 
similar problems at Rothamsted. 

The next reference I can find to the chess-board analysis is early in 1923, 
when Beaven had asked Gosset to explain again the procedure he had used ten 
years before. The final lap of the long passage to an “‘analysis of variance” is of 
sufficient historical and personal interest to place on record. On 29 March 1923 
Gosset writes : 


I enclose a note on the chess-board error. I was using the formula before the war and 
see no reason to repent of it. I am writing Fisher asking him to look it over and if necessary 
criticize. 

The method given is that which I have described above, involving the calcula- 
tion of the }n(n — 1) squares of differences. It was naturally a lengthy procedure, 
and I find a brief note of Beaven on the papers, after working through an 
example: “Conclusion (if any possible) from above is that P.£. with chess-boards 
might be guessed at almost as well as calculated.”’ It needed a “Student” with 
his facility for doing calculations in spare moments on the back of an envelope to 
cope with such computations. But the author of the method himself was not 
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content and on 9 April in the second half of a letter started on the 6th, he writes 
again to Beaven: 


Since writing the above I have had a vision on the subject of chess-board error and 
enclose a rough proof of my new method. I have written to Yule asking him whether he is 
in fact working at chess-board error and enclosing a similar proof. If he is not I shall be 
inclined to write it up and shall ask your leave to use the No. 1 chess-board of 1913 as an 
illustration. If he is, he has doubtless got something as good or better, and he can put 
mine in the W.P.B. 

To use my new method with 15 plots, each of 8 varieties (1) find the square of the s.p. 
of the whole 120 plots, 2*; (2) after calculating the averages of the eight varieties, find the 
square of the s.p. of these eight figures, 02; (3) after calculating the averages of the fifteen 
groups of eight, find the square of the s.p. of these fifteen figures, ¢?;. Then the P.£. of the 
error of a comparison should be 


120—8-—15 


In calculating the s.pD.’s do not use the (m— 1) divisor. 


The ‘“‘rough proof” of the method which he enclosed was as follows: it will 
be seen to be on similar lines to that given in the paper “‘On testing varieties 
of cereals” (11, pp. 282-3) except for the omission of the term —o?/mn referred 


to in the published paper, which resulted in a divisor of mn —m—n instead of 
mn—m—n+ 1. 


Memorandum 


Let m plots of each of n varieties be chessboarded. There will be m groups each con- 
taining one of each of the n varieties. If 2? be the variance of the mm plots, it may be 
considered to be composed of three parts which as a first approximation may be taken as 
uncorrelated : 

(1) The real differences between the varieties, o, 

(2) The errors common to each group of n, o?, 

(3) The remaining casual errors, a2. 
Of these the last is the only part that affects the comparison of varieties since the differences 
which we intend to measure compose (1), and (2) is eliminated by the process of chess- 
boarding. 

It remains to find the best estimate of (1), (2) and (3) given 2°, the averages of the n 
varieties, and those of the m groups. 

Now if o, be the s.p. of the averages of the n varieties 


=a, m,t 


and if o,, be the s.p. of the averages of the m groups 


2 
On — % +t n. 


Also 2*= + 02 + of. 


* This is the p.v. of the difference between two means of fifteen plots. It must be squared and 
multiplied by m=15 to get into the form of (18) below. [E. 8. P.] 

+ The expression on the right-hand side should have been oz+ o2(n— 1)/mn; this is equal to the 
expectation of o?. Similar corrections to the o* term are required in the next two equations. 
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Hence 03, = o3( 1 


Whence the others follow, and the error of a comparison between a pair of varieties is 


6 2 2 
om) 
Vm mn—m—n 


In the next letter to Beaven of 20 April Gosset writes: 


Now as to chess-board error. About a week after I sent the proposed simplified method 
to you and Yule, I got a note from Fisher via Somerfield giving the same method in rather 
more technical language. Next I got a reply from Yule saying that the method was new and 
giving it his blessing more or less, and finally I got a p.c. from Fisher this morning saying 
that the divisor should be mn —m—n+1 not mn—m—n. Anyhow the thing seems to have 
some weight behind it now. 

It should give the same result as my original method.... 


That the agreement between the two results depends on the identity* 


n(n—1) (m—1) (n—1) (m—1) 


was shown by Fisher in the letter Gosset quotes in the footnote to p. 283 of his 
paper (11). The expression on the left-hand side is taken from formula (14) above, 
while that on the right represents the estimate of the sampling variance of the 
difference between two single plot yields obtained by the usual analysis of 
variance method. 

Fisher’s application of the method was given in a joint paper with W. A. 
Mackenzie on “‘Studies in crop variation”, received by the Journal of Agri- 
cultural Science on 20 March 1923 and published in July. The theory was 
illustrated on an experiment with potatoes ‘planted in triplicate on the ‘chess- 
board’ system”’; the arrangement of the piots was not so well balanced as in 
Beaven’s chess-board and as yet no question cf randomization was considered. 
The paper contained what was | think the first published arrangement of 
numerical data in an analysis of variance table (then described as analysis of 
variation), and a method was given of testing for the significance of the treat- 
ment (or variety) sum of squares, taken as a whole. 

“Student’s” paper (11) was read before the Society of Biometricians and 
Mathematical Statisticians on 28 May 1923 and published in Biometrika in the 
following December. In obtaining the formula of the memorandum even with 
the slip which no doubt he would later have found out himself, and in the 
description. of the method of procedure given to Beaven, he had so evidently 
after long searching reached the essential conception of breaking up a total sum 


* In this notation d,,,.;=%y,;—2,; or is the difference between wth and vth varieties in the ith 
block. %,, Z; and Z are the variety, the block and the grand mean respectively. There are ” varieties 
and m blocks. 
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of squares into parts* that I feel his achievement should be put on record. As 
we have seen, in his modest way he was ready to have his results thrown into 
the waste paper basket, if another statistician could improve on his work! 


Whether his mathematics could ever have shown unaided that if no variety 
differences existed: 


{1) the expressions —o? —o?, and o? of hismemorandum were independent, 


(2) were each distributed in a modified form of the distribution he had 
discovered in 1908, 


(3) gave a ratio whose distribution law was a Pearson type VI curve; 


all this is doubtful. But, as he would have said himself, why speculate, these 
further results were derived by Fisher; the problem was therefore solved and 
a new chapter opened. 

The 1923 paper (41) contains much else of interest besides this handling of the 
chess-board type of experiment. It starts with an historical survey of the develop- 
ment of experiments aiming at the comparison of cereals and concludes with a 
critical discussion of the half drill strip method. The simple theme which I have 


referred to on many occasions runs through the whole and takes form in a final 
concluding sentence: 


It is shown that methods (2) [chess-board] and (3) [half drill strip] depend for their 
accuracy on the fact that the nearer two plots of ground are situated, the more highly are 


the yields correlated, so that we are able to increase the effect of the last term of the 
equation 


2 — ¢ 
O4—p= 2774p 


(where A and B are the varieties to be compared) by placing the plots to be compared with 
one another as near together as possible. 


LATER PAPERS 


In his later papers Gosset tended to avoid, as far as possible, the introduction 
of mathematics and he would ask his friends to regard him as a non-mathe- 
matician. Thus he forwarded his paper on the Lanarkshire milk experiment (47) 
to Karl Pearson with the words: 

I hope you will find it interesting, though its chief merit to the likes of me (that there 
are no mathematics in it), will hardly commend it to you. 

Or again, writing to me in 1926 regarding the original x? paper (Karl 
Pearson, 1900) he remarked: 

I have now read the y? paper in Phil. Mag. 50. It may be divided into three parts, one 


that I can follow as a man who could cut a block of wood into the rough shape of a boat 
with his penknife might appreciate a model yacht cut and rigged to scale, the second I can 


* His original approach to statistics through Airy’s book made this a natural way of regarding 
things; see the formula (5) I have quoted above. There are points in Gosset’s proof in (141, p. 282) 
also reminiscent of Airy, Theory of Errors of Observations (1875, p. 46). 

Biometrika xxx 16 
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only compare to a conjuring trick of which I haven’t got the key (such for mexaple as the 


transformation to polar co-ordinates on p. 158) and lastly quite a small part which I think I 
can understand. 


When at last, after the war, an increasing number of men trained as 
mathematicians began to turn their attention to statistics, it was not perhaps 
surprising that one whose mathematical training had ceased with Oxford 
Mods. in the nineties should refuse to regard himself as a mathematician. 
Besides, the increasing responsibilities of his work as a brewer left him little time 
or inclination to follow out in detail the continuous elaboration of the theory of 
mathematical statistics. As a result, in his relatively rare publications he tended 
to concentrate on simple exposition of the function of statistical method. The 
best examples of such work are: 

(1) The paper on “Errors of routine analysis” of 1927 (45) which develops 
more fully a theme he had touched on before (4 and 8), and shows how some 
recent theoretical work on the distribution of ‘‘range” in small samples might 
be made to give a useful working tool for the analyst. 

(2) Two admirable papers on the use of statistical methods in agriculture, 
both unfortunately rather inaccessible to the ordinary student: ‘Mathematics 
and Agronomy’”’, 1926 (14), and the article on “Yield Trials” in Bailliére’s 
Encyclopedia of Scientific Agriculture 1931 (16). 

This recession from the mathematical approach of his earlier papers had 
other consequences. In the first place it meant that during a period of rapid 
advance in statistical technique there wes available, for almost anyone in need 
of advice, a statistician of great practical experience and unusual insight, whom 
the inquirer could be sure would not be carried away by the fascination of any 
mathematical model into allowing abstract theory to step beyond its proper 
sphere. On the other hand there were certain disadvantages ; Gosset’s avoidance 
of a mathematical statement of his case sometimes, as in his last two papers 
(21), (22), made it difficult for others to grasp an idea or method which probably 
was clear enough in his own mind. The theory of probability is based on 
mathematics, and beyond a certain point there are dangers in introducing it 
into practice without a precise mathematical statement of the assumptions 
underlying the method of procedure. 

If we return to 1923, it is clear that Gosset welcomed with enthusiasm the 
new methods that R. A. Fisher was developing. The neatness of the arrange- 
ment of calculations in an analysis of variance table for example, appealed to 
him. [t brought to the rather laborious calculation methods of his own a simplifica- 
tion whose value he was quick to realize. The introduction of ¢ as the ratio of a 
deviation to an estimate of its standard error, in place of his own criterion z, and 
the use of degrees of freedom, appealed to him at once because of the greater 
generality ; as a result he calculated extended values of the probability integral 
of ¢ to replace his old z tables and published these in 1925 (13) in conjunction 
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with a theoretical contribution of Fisher’s. In print and in correspondence he 
emphasized the importance of randomness. “The experiments”, he wrote in 
1926 (14, p. 711), ““must be capable of being considered to be a random sample of 
the population to which the conclusions are to be applied. Neglect of this rule 
has led to the estimate of the value of statistics which is expressed in the 
crescendo ‘lies, damned lies, statistics ’.” 

This paper of 1926 contains perhaps the extreme limit to which he ventured in 
allowing the toss of a coin or a die to decide the arrangement of plots in an 
agricultural experiment. On the last page (p. 719) he suggests the arrangement 
of four varieties in an 8 x 8 square, in which two plots of each variety are to fall 
in each row and each column. Subject to this restriction the arrangement was 
to be obtained in a random manner. 

He must soon, however, have realized the disadvantages of such a procedure. 
If A, B, C and D represent the varieties, a possible if unlikely run of luck might 
lead to the following pattern of plots in one corner of the square: 


x 


| 
| 
| <A A Cc D B 
B | A | A D 
re | | | 
i | 
Fig. 2. 


Should this chance juxtaposition of many A-plots happen to coincide with a 
“fertility summit” or “‘depression” in the field, the resulting statistical analysis 
of plot yields might easily attribute a characteristic to the variety A which it did 
not possess. His practical mind could not accept such a state of affaire To 
know in advance that if an experiment was carried out with a particular pattern 
of plots there was quite a chance that it would be misleading, and to continue 
with this pattern—this was a course he was not prepared to follow. It was no 
compensation to be told that in the long run, if the verdict of the random toss 
was accepted and the 5 % significance level of mathematical tables used in the 
statistical analysis, then misleading results would be obtained only 5 times in 
100. In his own words (22, p. 366): 


It is of course perfectly true that in the long run, taking all possible arrangements, 
exactly as many misleading conclusions will be drawn as are allowed for in the tables, and 
anyone prepared to spend a blameless life in repeating an experiment would doubtless 
confirm this; nevertheless it would be pedantic to continue with an arrangement of plots 
known beforehand to be likely to lead to a misleading conclusion. 
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His withdrawal from the out and out randomization position is illustrated 
in his article of 1931 on “‘ Yield Trials” (16). Here he speaks of the Latin square 
arrangement as ideal in the types of experiment for which it is suited, because it 
combines the elements of balance and randomness, but he is critical of the ran- 
domized block arrangement because of the risk involved of getting misleading 
results. He gives the following illustration of a balanced or equalized block 


design which he had recommended to a horticultural correspondent, comparing 
ten treatments with five replications: 


G AH E Cc A j 
| Block I 
F D J B I | 
Block 
| Block 
B C H 
C F B 
Block IV 
A E G D H | 
Block V 
I C H E G 


Fig. 3. 


In this example the assignment of treatments to plots in Block I is random, 
but each successive block has its arrangement more and more controlled, so that 
(i) each of the five columns contains one plot only of the ten varieties, (ii) A, D, 
E, F and J occur in the top row of their block three times and in the lower row 
twice, while for B, C, G, H and I the position is reversed, an arrangement as 
nearly balanced as possible for an odd number of blocks. 

In advocating the introduction of this element of balance, he did not con- 
sider that the random element could be dispensed with; but he believed that if 
a regular pattern was used to equalize the more probable variations in fertility 
there were still sufficient complications to leave the residual variations random 
enough to justify from the practical point of view the application of probability 
theory. It was here that he disagreed and was eventually forced into open 
controversy with R. A. Fisher and the Rothamsted school. 

This is not the place to enter into detail regarding the nature of this con- 
troversy, which resulted in Gosset’s last paper published a few months after his 
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death (22). It is however well to emphasize that his attitude was closely related 
to the type of agricultural problem with which he had had most experience, the 
development of improved strains of barley. In such a case as this he saw that 
success was only likely to result from a comparison of two or more strains in a 
number of years and in a number of different localities. Small scale investiga- 
tions must be followed by others in which the technique conformed as far as 
possible to ordinary agricultural practice. In each case some experimental plan 
was needed which would give the yields, let us say, of variety A and variety B 
on the experimental area with as little error as possible, that is to say freed from 
bias such as might be introduced by changes in fertility, patches of weed, etc. 
Provided that the error of the difference (yield of A—yield of B) could be kept 
low, he was satisfied with a knowledge of its probable upper limit and did not 
mind if he was told that the ratio of this difference to the estimate of its standard 
error in a particular experiment could not be referred with mathematical 
precision to a table of probabilities. He was interested primarily in the behaviour 
of the difference from farm to farm and year to year, and experience had shown 
him, beyond any possibility of doubt, that small scale balanced plot experiments 
followed by larger scale tests with the half drill strip method of Beaven’s, the 
purpose of which any intelligent farmer could understand, had achieved 
remarkable success in the improvement of barley. If it were argued that fully 
randomized experimental designs would have achieved the same or better 
results he would not have denied this dogmatically, but he felt doubtful on the 
point because his perusal of reports on such experiments showed to his mind an 
unduly high proportion of inconclusive results. He would also have added that 
with the staff, the ground and other facilities available in the investigations for 
whose planning he was responsible, fully randomized designs could not have been 
carried out. This was his attitude in writing the Statistical Society paper of 
1936 (21). 

In his final paper (22) he attacked his critics on their own ground by pointing 
out that in the experiment at a single station a balanced arrangement of plots 
in blocks was on the whole more likely to detect variety differences than a random 
arrangement when those differences were really large and therefore important, 
although for small differences the reverse would be true. 

The ultimate decision on these points can hardly be expected as yet; it will 
come in time, perhaps after 10 or after 20 years, when there has been ample 
opportunity for the practical experimenter, freed from the weight of authority, 
from fear of mathematics on the one side and from the fascination of a new 
technique on the other, to judge from accumulated experience what methods 
have been most worth while having regard to the results they have achieved. 

In addition to these papers on agricultural subjects a brief reference may be 
given to some other published work of the last few years: 

(1) A paper on ‘The Lanarkshire milk experiment”, 1931 (17); his suggestion 
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that the experiment should be repeated on a more precise but far less expensive 
scale by using pairs of twins involves a characteristic introduction of his paired 
difference plan. 

(2) Two papers on certain implications of F. L. Winter’s selection experi- 
ments with maize 1933 (19) and 1934 (20). The plant breeder’s problem of 
improving varieties of cereal by continued selection had long been of interest to 
him in connexion with barley and in these papers he discusses the bearing of these 
experimental results upon evolutionary theory. 

(3) A number of short but suggestive contributions to the discussion of 
papers read before the Industrial and Agricultural Research Section of the Royal 
Statistical Society (see references on p. 249 below). 


EXTRACTS FROM LETTERS 


I have spoken more than once of Gosset’s correspondence; the professional 
statistician, whether he be attached to a university or research station, receives 
and expects to receive appeals for advice which will continue to increase through 
life as his circle of contacts grows. But with Gosset the position was somewhat 
different ; to provide advice to correspondents all over the world was in no way 
part of his job. Yet he gave that help unstintingly and unless it could be 
described as brewery business, he gave it out of his own time. Advice as to how 
to plan a particular experiment, or explanations of misunderstood points in 
statistical theory, while of extreme value at the time to the individual who 
receives them are rarely of interest to the general reader. Nevertheless, I believe 
that a few quotations from letters will add to the record of Gosset’s personality 
by showing something of his patience, his practical mind, his suggestiveness and 
his characteristic freedom of expression. 

The first quotations are taken from a long letter written to me in 1926. At 
that time I had been trying to discover some principle beyond that of practical 
expediency which would justify the use of “‘Student’s” ratio z=(%—m)/s in 
testing the hypothesis that the mean of the sampled population was at m. 

xosset’s reply had a tremendous influence on the direction of my subsequent 
work, for the first paragraph contains the germ of that idea which has formed 
the basis of all the later joint researches of Neyman and myself. It is the simple 
suggestion that the only valid reason for rejecting a statistical hypothesis is that 
some alternative hypothesis explains the observed events with a greater degree 
of probability. The second part of the letter probably put into my mind the 
very extensive plan of sampling from non-normal populations which we carried 


out in the Department of Statistics at University College during the next few 
years. 
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Letter I 
From a letter of W. S. G. to E. 8. P., dated 11 May 1926. 


In your large samples with a known normal distribution you are able to find the chance 
that the mean of a random sample will lie at any given distance from the mean of the 
population. (Personally I am inclined to think your cases are best considered as mine taken 
to the limit large.) That doesn’t in itself necessarily prove that the sample is not drawn 
randomly from the population even if the chance is very small, say -00001: what it does is 
to show that if there is any alternative hypothesis which will explain the occurrence of the 
sample with a more reasonable probability, say -05 (such as that it belongs to a different 
population or that the sample wasn’t random or whatever will do the trick) you will be 
very much more inclined to consider that the original hypothesis is not true. 

I can conceive of circumstances, such for example as dealing a hand of 13 trumps after 
careful shuffling by myself, in which almost any degree of improbability would fail to shake 
my belief in the hypothesis that the sample was in fact a reasonably random one from a 
given population. 


I’m more troubled really by the assumption of normality and have tried from time to 
time to see what happens with other population distributions, but I understand that you 
get correlation between s and m with any other population distribution. 


Still I wish you’d tell me what happens with the even chance population [__] or such a 
one as /\: it’s beyond my analysis. 


* * * * * * * 


If Student is wrong it is up to you to give us something better. You see one must 
experiment and frequently it is quite out of the question, from considerations of cost or of 
impossibility of duplicating conditions in the time scale, to do enough repetitions to define 
one’s variability as accurately as one could wish. It’s no good saying ‘“‘Oh these small 
samples can’t prove anything”. Demonstrably small samples have proved all sorts of 
things and it is really a question of defining the amount of dependence that can be placed 
on their results as accurately as we can. Obviously we lose by having a poor definition of 
the variability but how much do we lose? 


Letter II, with its enclosure which, for reasons I have forgotten, was 
never published, was written shortly after K. P. had made an editorial 
comment on “Sophister’s”* (1928) interpretation of the distribution of 
““Student’s”’ ratio in samples from a non-normal population. It had been found 
that in such cases the distribution of ¢ was asymmetrical, but that the distribu- 
tion of | ¢ | (or of ¢) followed very closely the standard normal-theory form, i.e. 
if the distribution of ¢ was curtailed on one side of the origin this was balanced 
by a corresponding extension on the other side. The letter also refers to a 
suggestion of bringing up at a meeting of the International Statistical Institute 


the question of differentiating between the symbols used for probable error and 
standard error. 


* “Sophister” like ‘‘Mathetes” was the nom de plume of a disciple of “Student”. The 
particular sampling investigation in question had been sketched out by Gosset and myself before 
“Sophister” came to spend a year in the Biometric Laboratory. 
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Letter II 
Holly House, 
Blackrock, 
Co. Dublin. 


May 18th, °29. 
Dear Pearson, 


I was rather amused to see your letter open with an apology for delay in writing 
as I have for sorae time been acutely conscious that I have been in arrears. However, last 
things first. 

(i) I agree that Z’s second suggestion though sound is not workable. Your idea of 
raising the question at Warsaw seems to me to suggest the right way of getting to work. 
I think they should raise the question on the grounds (a) that + is being used in two senses, 
(b) that the prob. error is no longer the slightest use to anyone and (c) that as the tables 
are in terms of the s.D. a simple notation such as : or ; or anything of the sort is required. 

(ii) I fancy you give me credit for being a more systematic sort of cove than I really am 
in the matter of limits of significance. What would actually happen would be thay 1 should 
make out P, (normal) and say to myself “that would be about 50: 1; pretty good but as it 
may not be normal we’d best not be too certain”’, or “‘100: 1; even allowing that it may 
not be normal it seems good enough”’ and whether one would be content with that or would 
require further work would depend on the importance of the conclusion and the difficulty 
of obtaining suitable experience. 

One so often finds that the importance (and even occasionally the direction of the 
result) of varying one factor, change from experiment (or experience) to experiment accord- 
ing to the accompanying variations in other factors, that it often doesn’t pay to make too 
certain of any one result. 

E.g. You may have two varieties of barley one of which will give the best yield in one 
season or place while the other will win in another season or place; hence we have to sample 
places and seasons widely rather than aim at being meticulously accurate at all places 
sampled: there must be economy of effort. 


* * * * * * * 


Lastly I am enclosing a short note in reply to the Editorial footnote. Probably you are 
going to say all that is at all useful in it in your next paper, and in any case I haven’t the 
least intention of indulging in a controversy, so suppress it unless you think it will clear up 
our position. All the same I think it is a pity to let the thing go by default without any 
comment. 

Yours v. sincerely, 
W. Gosset. 


Suggested Note for “Biometrika” 
l7th May, 1929. 


In his footnote on page 422 of Sophister’s paper the Editor asks, ‘‘Supposing 50 per cent 
of prisoners tried for murder were acquitted and the remainder found guilty should we be 
right in the long run to drop the trial and toss up for judgment?’ This, if I may say so, is 
hardly what Sophister proposes to do. If I may deal first with the Editorial analogy the 
position is rather, ‘The evidence before the court is such that the chances are even that 
the prisoner committed the murder”. Doubtless if more evidence were forthcoming we 
should know more about it; as it is, an English Court will acquit, though the inexorable 
Justice of Shan Tien would condemn the prisoner to piecemeal slicing, unless of course 
sufficiently weighty evidence for the defence could be imperceptibly introduced within the 


Mandarin’s sleeve. But, seriously, a better illustration can be drawn from the practice of 
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Insurance where in the first place the premium is calculated on the Healthy Male table and, 
I suppose, originally this was the only basis after a medical examination. But the material 
which supplied the experience for the H.M. table can be subdivided into various classes, by 
professions and occupations, by stature or eye colour, total abstainers or moderate drinkers 
and so forth, which further investigation may find to have expectations c* life which do not 
accord with the table. The life expectation of some of these classes is probably taken into 
consideration by the Companies—I doubt whether a Lion Tamer, however healthy, could 
insure at the ordinary rates—but no company, as it well might, charges a lower rat: of 
premium for the descendants of centenarians or a higher for orphans; they are «ost 
unfairly lumped together just as Sophister proposes to do with his samples from unknown 
populations. In effect he says, ‘“‘This small sample is from an unknown population, which 
may be normal; it probably is not far from normal; if it is normal we use the table justly, if 
it is anormal but symmetrical we can still use the table with sufficient accuracy ; even if it is 
skew, about which we cannot be sure—much less about the direction of the skewness—we 
shall in the long run draw much the same proportion of correct inferences as if it were 
normal.” Admittedly our ignorance of the nature of the population introduces an element 
of uncertainty which no sensible person will ignore when using the tables, but recent work, 
and not least Sophister’s, shows that this uncertainty, while not altogether negligible, is 
much less than we had any right to expect. 


Student. 


The suggestion in Letter III of 1932 ultimately led to the production of 
tables of percentage limits of the ratio of (a) range in a sample of n observations 
to (6) an independent estimate of standard deviation, which are to be published 
shortly in Biometrika, From the beginning of his analysis of the results of 
the chess-board experiments, Gosset had wondered how best to judge what 
differences among variety means were significant. While the ratio of (a) the 
difference between any two means selected at random to (b) the estimate of 
standard error could be referred to ‘‘Student’s” distribution or, if desired, the 
significance of the set as a whole could be judged by Fisher’s z-test, it was not 
possible to treat selected differences in either of these ways. In the article in 
Bailliére’s Encyclopedia (16, p. 1358) he refers to a method suggested by Fisher 
of taking the differences between individual variety yields and the mean yield. 
He felt however that a knowledge of the probability levels of “‘studentized” 
range would in addition be very useful; on this could be based a rough test of 
the kind he had suggested in his paper on “Errors of routine analysis” (15, 
p. 161). 

Letter III 


St. James’s Gate, 
Dublin. 


Jan. 29th, °32. 
Dear Pearson, 


Many thanks for your letter and enclosure: as I am at the moment 
“The Cook and the Captain bold 
And the mate of the Nancy brig”’, 
I have handed all the lot to Mathetes till such time as I can get a chance of dealing with 
it which should be sometime next week. 
I have been meaning to write to you for some time re the proposals for the use of range 
and sub-range which I made in my last letter to you. Of course there is a serious crab which 
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I had at one time recognised and then forgotten in that the thing would have to be 
“‘Studentised”’: the only measure of the s.D. is provided by a limited number of degrees of 
freedom. Whether one could get an approximate correction for this with moderately small 
numbers by reducing still further the degrees of freedom or whether it would be necessary 
as Fisher suggested when I mentioned the matter to him (he was here lecturing) to dive 


into the depths of hyperspace to produce the jewel I am not clear, but obviously something 
would have to be done about it. 


* * * * * * * 


Yrs. v. sincerely 
W.S. Gosset. 


Letters IV and V of 1936, which Dr Beaven has kindly allowed me to repro- 
duce, deal with the interpretation of the results of half drill strip barley experi- 
ments carried out at six stations in England; the two varieties compared were 
Plumage Archer and Beaven’s 35/7. The second letter followed a reply from 
Beaven discussing the position in terms of betting on two horses, whose form 
varies on different courses. The argument illustrates Gosset’s outlook on the 
function of large scale experiments to which I have already referred. 


Letter IV 
From a letter of W. 8. G. to E. S. B., dated 8 January 1936. 


If you derive the s.z. from a set of 10 strips at one station, you are sampling ‘‘com- 
parisons between plots grown at a certain station in the weather of 1935” and can draw the 
appropriate conclusion, e.g. that at Sprouston it is quite certain that Beaven’s 35/7 would 
have beaten Plumage Archer in any sound arrangement of plots in 1935. 

When however you regard the six stations as a small sample of the barley land of 
England you can very nearly draw the conclusion that Beaven’s 35/7 would on the average 
have beaten Plumage Archer if compared all over the barley land of England in 1935. 

The chance that so favourable a result would have happened if there were really no 
difference between them is only 1/38, i.e. the odds are 37 to | against it’s happening. This 
is very nearly significant but as you know, what odds are to be considered significant is a 
matter of convention—or taste. 

Naturally, in calculating the s.£. (not really an error at all) of the second conception 
where the variation from Station to Station depends as much, (or much more. ..than), on 
the differential response to weather and soil as on the soil errors taken account of in each 
station, one takes no particular account of the s.&.’s at the individual stations: one merely 
rejoices because the Half Drill Strip method has largely eliminated the errors due to soil 
position and left us mainly the differential response aforesaid, which would have affected 
the result to a greater or less extent in every field of barley-growing England and which 
we have assumed that we have sampled by the six results which we have examined. 

I hope I have made the distinction clear between the s.£. of the result at one station, 
which is rightly derived from the plots grown at that station but which only enables us to 
judge whether the result is significant for that station, and the s.&. of the whole series, 
derivable only from the six mean results of the six stations but which enables us to make an 
estimate of the result of comparing the barleys “‘everywhere’’, where ‘‘everywhere”’ 


represents the whole extent of country that may properly be considered to be sampled by 
the six stations. 
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Letter V 


Davan Hollow, 
Denham, 
Bucks. 


14. 1. 36. 
Dear Beaven, 


I don’t think your analogy is quite exact: this is mine. 

The two horses 35/7 and P.A. are known to vary somewhat from day to day and also 
to be very much affected by the particular course on which they are running. 

They have raced ten times at Sprouston and 35/7 has won every time by amounts 

varying from one furlong to two furlongs. At Sprouston then you may lay longish odds on 
35/7. At Cambridge they raced ten times and on the average 35/7 won by 50 yds, the amounts 
varying from 270 yds in favour of 35/7 to 170 in favour of P.A. You would not therefore 
bet very heavily on 35/7 at Cambridge. At four other places 35/7 beat P.A. on average by 
various amounts. What odds is to be given on another hitherto untried course? 
, You are surely as much influenced by the narrowness of the margin at Cambridge as by 
the width of it at Sprouston: the new course may resemble the one with just as much likeli- 
hood as the other and may even as far as you can see favour P.A. rather than 35/7, since 
your knowledge of the difference between courses rests on only six cases. 

Furthermore a new method of training may reduce the variation so that the Sprouston 
results may lie between 1} and 1} furlongs and the Cambridge between 160 yds in favour of 
35/7 and 60 yds in favour of P.A., without altering very much* the odds on a series of races 
on a new course, since the chief source of variation remains the reaction of the horses to the 


courses and not the day to day variation which alone is measured by the variation on a 
single course. 
* * * * * * * 


Yours v. sincerely 
W.S. Gosset. 


* But since the smaller day to day variation prevents an accidentally high or low value of mean 
obscuring the real value of the course there is a better chance of getting the right odds—-not of 
getting higher odds. 


Letter VI was written at the time when Gosset was putting together his last 
paper (22), 
Letter VI 
Dart Cottage, 
Postbridge, 
Devon. 


19. iv. 37. 
Dear Pearson, 


Many thanks for yours of 10th; I feel I’m rather wasting your time but as long 
as you ask questions you must expect to get answers. You have given my reason for not 
changing the level of significance viz. that while balancing certainly tends to produce a lower 
real error and consequently higher calculated error one cannot say how much one has 
succeeded in any particular case. I therefore content myself with pointing out that the 
tendency is beneficial, not only are the cases missed of comparatively little value but one 
actually gets more conclusions of real value. 

* * * * * * * 

Now I was talking about Cooperative experiments and obviously the important thing 
in such is to have a low real error, not to have a “‘significant”’ result at a particular station. 
The latter seems to me to be nearly valueless in itself. Even when experiments are carried 
out only at a single station, if they are not mere five finger exercises, they will have to be 
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part of a series in time so as to sample weather and the significance of a single experiment is 
of little value compared with the significance of the series—which depends on the real error 
not that calculated for each experiment. 

But in fact experiments at a single station are almost valueless; you can say “In heavy 
soils like Rabbitsbury potatoes cannot utilise potash manures”, but when you are asked 
*“What are heavy soils like Rabbitsbury?’’ you have to admit—auntil you have tried else- 
where—that what you mean is “‘At Rabbitsbury etc.”” And that, according to X may mean 
only ‘In the old cow field at Rabbitsbury”. What you really want to find out is “In what 
soil and under what conditions of weather do potatoes utilise the addition of potash 
manures?” 

To do that you must try it out at a representative sample of the farms of the country 
and correlate with the characters of the soil and weather. It may be that you have an easy 
problem, like our barleys which come out in much the same order wherever—in reason— 
you grow them or like Crowther’s cotton which benefitted very appreciably from nitro-chalk 
in seven stations out of eight, but even then what you really want is a low real error. You 
want to be able to say not only “We have significant evidence that if farmers in general 
do this they will make money by it’’, but also ‘“‘we have found it so in nineteen cases out of 
twenty and we are finding out why it doesn’t work in the twentieth’’. To do that you have 
to be as sure as possible which is the 20th—your real error must be small. 


* * * * * * * 


Tedin:* Somerfield sent me the number and I have just had time to glance at it. T. put 
down three kinds of patterns of Latin Squares (5 x 5) on various uniformity trials. There 
were 


Two Knight’s moves: Two Diagonals: 


A B Cc D E A B C D E 
D E A B C E A B C D 
B C D E A D E A B Cc 
E A C D Cc D e A B 
C D E A B B Cc D E A 


and a number of randoms. 
Of course all Latin squares are “balanced”? but one wouldn’t care too much for the 
Diagonal” arrangement and the Knight’s move would, I think, be preferred to all others. 
In conformity with this Tedin found a slight tendency for the Knight’s move to give a low 
actual and a high calculated error while the diagonal tends to give a high actual and a low 
calculated error. The whole thing is not worth worrying about but is interesting as an 


illustration of what actually happens when we depart from artificial randomisation: I 
would Knight’s move every time! 


Yours 
W.S. G. 


P.S. Beaven after all got some slight ailment which prevented his being in the chair for 
Bartlett’s paper: I proposed the vote of thanks....I was heard without enthusiasm but 
there were no cat calls! 


Such are my impressions of Gosset and of his work. Others will have different 
views on the relative importance of his many contributions to statistics; on his 
riglttness or wrongness. The experimentalist will have seen him in a different 
light from the mathematician; his personal friends will have realized aspects of 
his character which his correspondents could not see. But all who have known 
him will agree that he possessed almost more of the characteristics of the perfect 


* A reference to the paper by O. Tedin (1931). 
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statistician than any man of his time. They will agree, too, on the essential 
balance and tolerance of his outlook, and on that something which a friend of his 
schooldays has described as an “immovable foundation of niceness”’ which 
made him through life the same friendly dependable person, quiet and un- 
assuming, who worked not for the making of personal reputation, but because 
he felt a job wanted doing and was therefore worth doing well. 
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THE DISTRIBUTION OF SPEARMAN’S COEFFICIENT OF 
RANK CORRELATION IN A UNIVERSE IN WHICH ALL 
RANKINGS OCCUR AN EQUAL NUMBER OF TIMES 


By M. G. KENDALL, SHEILA F. H. KENDALL 
AND B. BABINGTON SMITH 


PART I. THEORETICAL DETERMINATION OF THE SAMPLING DIS- 
TRIBUTION OF SPEARMAN’S COEFFICIENT OF RANK CORRELATION 


INTRODUCTION 


1. If » individuals are ranked according to two qualities in the orders 
X,, Xq, ..., X, and Yj, Yq, ..., ¥,,, where the X’s and the Y’s are permutations of 
the numbers 1 to n, the coefficient of rank correlation between the rankings is 


defined as 


where d; = X;—Y;. The coefficient p, introduced by Spearman (1904), is the 
product-moment coefficient of correlation between X and Y. 

If and only if the correspondence between the two rankings is perfect, i.e. 
X;=Y;, p=1. On the other hand, if and only if the two rankings are exactly 
inverted, i.e. X; = Y,,_;,,, P= — 1. In other cases p lies between these limits. 


2. Inorder to judge of the significance of a value of p it is necessary to consider 
the distribution of values obtained by correlating an arbitrary order, which may 
conveniently be taken as the order 1, 2, ..., », with all other permutations of the 
numbers 1 to n. In practice it is generally more convenient to consider the 
distribution of the quantity S(d*), which is related to p by equation (1). 


3. Certain simple properties of this distribution are obtainable immediately. 

(a) Any value of S(d?) must be even. For S(d) = 0, being the difference of the 
sums of the first » natural numbers; hence the number of odd values of d is even, 
and so is the number of odd values of d*. 

(b) The possible values of S(d?) range from 0 to }(n?—n) and hence there are 
k(n? —n)+1 of them. 

(c) The distribution is symmetrical, about a central value if 3(n*— 1) is even, 
or about two adjacent central values if 4(n3—n) is odd. This follows from the fact 
that to any given value of p corresponding toa permutation P there will correspond 
a negative value of p of the same absolute value arising from P inverted. 

For, if the permutation P is X,, X,, ..., X, the inverted permutation is 
X,,, Xp-4, ---» X,. S(d*) calculated from P is then S(X;—7)? and that from P 


n? 
inverted is S(X;—n+1+7)®. The sum of these two is 


S(X3) + — 28(X,i) + S(X2) + S(m+ 1-1)? 28{X,(n+ 


n | 
68 (d?) 
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The first, second, fourth and fifth items in this expression are each equal to the 
sum of the squares of the first » natural numbers; the third and sixth, taken 
together, are equal to —2(n+1)S(X,;) = —2(n+1).4n(n+1). Hence the sum 
of the two values of S(d?) 

= $n(n+ 1) (2n+1)—n(n+ 1)? 

= 4(n?—n). 
The result follows simply from equation (1). 


(d) It follows from (c) that all odd moments about the mean of the distribution 
of S(d?) vanish. 


4. A further important result, due to “Student’’, was given by Karl Pearson 
(1907), namely that the second moment of p is 


= — (2) 
from which it follows at once that 
Nes 

5. The distribution of p has recently been considered by Hotelling and Pabs; 
(1936), who have proved the remarkable theorem that as n tends to infinity the 
distribution tends to normality. 

The distributions for low values of n, so far as they have been obtained, 
deviate quite considerably from normality and it has not previously been made 
clear how great » must be for normality to be assumed with much confidence, 
particularly in the determination of significance levels. Unfortunately p is 
mainly of service in the range n = 10 to n = 30, i.e. precisely where the doubt lies. 
It is the aim of the present paper to throw some light on this crepuscular territory. 


EXPRESSION FOR THE DISTRIBUTION OF S(d?) 

6. Consider the deviations between the order 1, 2, ..., m and an order X. If 
one deviation is known, then certain deviations become impossible for other 
ranks. For instance, if the deviation d, between X, and 1 is (n—1), then X,=n, 
and it is impossible for the deviation between X, and 2 to be (n— 2); or for the 
deviation between X, and 3 to be (x —3), and so on. Consider then the array: 


m-1 n-2 n-3 ... 2 1 0 
n-3 n-4 0 
n—-3 n-—-4 0 —: 
2 1 —(n-—5) —(n-—4) —(n—-3) 
1 0 ... —(n—4) —(n-—3) —(n-2) 
0 —2 ... —(m—3) -—(n-—2) —(n-1) 


If d,, has the value in the rth row and the kth column then d, cannot have the 
value in the rth row and the /th column; and so on. 
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In fact, any permissible set of deviations is given by taking n entries from the 
above table so that no row or column contributes more than one entry. 

Hence to get S(d*) for any permissible set write 
a® al a* a? 
ai a® ai at 


at al a® ai 


—2P ak qe —4pP 


and S(d*) is given by the index of a of one of the terms obtained from E by choosing 
n factors so that no row or column appears more than once and multiplying them 
together. Thus the distribution of S(d?) is given by the totality of n! terms which 
can be constructed in that way. EF will be taken to be equal to the polynomial 
in a given by the sum of these terms. 


7. E bears an obvious analogy to the determinant, but it cannot be regarded 
as such and expanded accordingly. If it could the distribution of S(d*) would be 
obtained without difficulty, for a determinant with the elements of E as given 
above may be shown to be equal to 


(1 a?)*-1 (1 —at)r-2 (1 (1 


E, in fact, lacks the fundamental property of the determinant in that it does not 
change sign if two rows or columns are interchanged. 


8. Nevertheless certain of the rules of determinantal algebra remain true 
for Z. The most valuable is that H may be expanded in terms of its minors of 
any order in the usual way. Expansion of this type is, in fact, rather easier with 
E than with the determinant, for all terms of E are essentially positive and there 
are no difficulties with signs. We have used this expansion repeatedly in obtaining 
the distributions given below. There are also certain devices which assist the 
expansion of EZ in virtue of its symmetry. Two which have been found useful 
are as follows: 


(a) Any minor of # is symmetrical in powers of a, i.e. is of the form 
A,a* +A A,ak- +... +4,a"*+ A,a™*+ Aya”. 


(b) The effect of shifting a minor bodily across E is to multiply each term of 
its expansion by a constant power of @. 


This property may be proved thus: Let an r-rowed minor be 


M= 


Biometrika xxx 


17 


| 
| 
| 
| 
| 
| 


254 Spearman’s Coefficient of Rank Correlation 


If we shift the minor A places to the left we have 


The factor a** in the first row is common to all terms and may thus be brought 
outside the curly bracket. Similarly for the other rows. We shall then be left with 
items of type a@—**—2A@—«), The factor a4: is common to all members of the first 
row and thus may be brought outside the bracket. The factor a—*’* is common to 
all terms of the same column and may also be brought outside. Proceeding thus 
with similar terms we shall have 


2928) Sx 
M Mar* 2AS(a)+2AS(«x) 


which is the result stated. 
For example the minors 


a 
a}; =a°+2a?+2at+a8 
las al 
and ( at a® att 
M’=a' at a® } = a (a9 + 2a? + 2at+ a8) 
| a® at 
are related by M' = Ma*, 


9. Even with these aids the evaluation of Z is a tedious business, though 
straightforward enough. We have found it for values of n from 1 to 8, the resulting 
distributions of S(d*) being given in Table I. 

As checks on the resulting distribution, it will be remembered that the total 
n*(n + 1)? (n—1) 

36 ; 


frequency is n! and the second moment about the mean 


10. Additional checks on the lower values of S(d?) may be obtained by con- 
sidering directly the number of permutations giving S(d*) = 0,2, 4,...,ete. For 
example, with n ranks, a value of 2 can only arise by the sum of terms 1 + 1, which 
in turn can only arise by the interchange of two adjacent terms in the order 
1, 2, ..., m. This number of values is therefore (n—1). A value of S(d?) equal to 4 
can only arise as 1+ 1+ 1+ 1, i.e. by two interchanges of pairs of adjacent terms, 


—2)! 
and the number of ways such interchange can be made is ana . Expressions 
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TABLE I 
Distributions of S(d*) for values of n from 1 to 8 


Values of n 
S(d2) 6 7 8 
0 1 1 1 1 1 1 1 1 
2 1 2 3 4 5 6 rf 
4 3 0 l 3 6 10 15 
6 2 4 6 9 14 22 
8 1 2 7 16 29 47 
10 2 6 12 26 54 
12 2 4 14 35 70 
14 4 10 24 46 94 
16 1 6 20 55 129 
18 3. 40 21 54 124 
20 1 6 23 74 178 
22 10 28 70 183 
24 6 24 84 237 
26 10 34 90 238 
28 4 20 78 276 
30 6 32 90 264 
32 7 42 129 379 
34 6 29 106 349 
36 3 29 123 380 
38 4 42 134 400 
40 1 32 147 517 
42 20 98 394 
44 34 168 542 
46 24 130 492 
48 28 175 640 
50 23 144 557 
52 21 168 666 
20 144 595 
56 24 184 776 
(median) 
58 14 ; 84 
60 12 786 
62 16 718 
64 9 922 
66 6 745 
68 5 917 
70 l 781 
72 982 
74 $26 
76 950 
78 S44 
80 1066 
(median) 
Total 1 2 6 24 120 720 5040* 40320* 


* Total of whole distribution, only the median value and the values on one side of the median 
being shown in this table. 
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of this type, however, rapidly become very complicated. For values of S(d?) up 
to and including 22 we find the foliowing frequencies: 


S(d*) Frequency 
0 1 
2 n—1 


n—3' 
10 j 2 

n—6 n—5' in—4 n—4 n—4 n—-3 
12 2 


i4 
7 


16 
n—8 n—6 n— 7" 
40 


hs 
2 
n—2 
6 +2 
3 1 
n—4 n—3 n—2 
8 +4 + 
ee 4 2 1 
n—6 n—5 n—5 n—5 
+10) +12 +6 
5 4 3 3 | 
n—4 n—4 n-3 n—4 
Bee +4 +2 +4 +4 
2 1 1 2 
+ 
n—6 n—5 n—5 
Eee +6 +8 +4 +2 
are 
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S(a*) Frequency 


74) 


+8 
6 


+8 
+4 


+ 


3 
| 


n—6 n—6 (n—-7 n—6 
12 8 2 48 
n—5 n—6 n— 5) n—5 n—4 
n—7 n—6 n—7 n—6 
+10 5 3 )+10( 4 ) +10 9 


n—4\ (n-7 n—6 
2 


These results (the first four of which were given by Hotelling & Pabst (1936)) 
can, of course, be written more simply, but are set out in the above form so that 
the method of obtaining them may be followed more easily. Each term corre- 
sponds to a different type of arrangement required to give the specified value of 
S(d2). The successive values of the frequency do not appear to conform to any 
simple law, and it is not to be expected that they should, inasmuch as the terms 
composing them depend on the partitions of even numbers into squares. 
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11. The distributions of Table I are peculiar in several respects. For lower 
values of n they are distinctly bimodal. For n =7 and n= 8 the frequency polygons 


200 


S 


Frequency 


0 10 20 30 40 50 56 60 70 80 90 100 112 


Values of S(d?) 


Frequency 


| i | it 
0 20 40 60 80 84 90 110 130 150 168 
Values of S(d?) 


Fig. 1. Frequency polygons of S(d?) for n=7 and n=8. 


have an unusual serrated profile, as may be seen from Fig. 1, though normality 
is beginning to emerge. 
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12. The value of 1, for the distribution of p was given by Hotelling & Pabst 


(1936) as 3(25n*— 13n°— 73n? + 37n +72) 
25n(n + 1)2(n—1)8 
2 
and it follows that = 3+ =. (8) 


The values of £, — 3 for certain values of m are shown in Table IT. The distribution 
is platykurtic, approaching mesokurtosis as n becomes larger. But as might have 
been expected /, fails to reveal the serrated appearance of the frequency polygon 
for low n. 


TABLE II 
Values of £.—3 in the distribution of S(d*) for various values of n 
n f.-3 n | 
15 | —0-308 | 
2 — 2-000 20 —0-230 | 
5 — 0-928 25 —0-184 
10 —0-464 30 -0-153 | 


13. Sofaras we have calculated the distribution, the serrations in the frequency 
polygon show no signs of disappearing over the main range, and it is not imme- 
diately obvious what happens as n becomes larger and the polygon tends to 
normality. From the form of Fig. 1, however, it would seem that the tails of the 
curve smooth out first, and that the smoothness runs up towards the apex of the 
distribution as n tends to infinity. 


PITMAN’S APPROXIMATION 


14. It will be clear, we think, that at least for »=8 or less the normal curve 
offers only an indifferent representation of the distribution of S(d?). For example, 
in the distribution for n=8, the chance of getting a value of S(d*) outside the 
range 14-154 (i.e. as great as or greater than 156, or as small as or less than 12) 
is 0-0107 (the nearest point to a 1 °, significance level in this discontinuous case). 
If the distribution were taken to be normal with the same mean and standard 
deviation (in this case 12./7) the chance would be 0-0233. A correction for con- 
tinuity would not improve matters materially. 


15. Pitman (1937), observing that the first four moments of p are approxi- 


mately the same as those of the B-distribution 


1 2 
— 72 
has suggested that the probability integral of this curve, i.e. a Pearson Type IT 
curve, may be used for that of p; and says that the true values agree well with 


| 
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those of the approximate distribution even for values of x as low as 6, which is 
apparently the greatest value of n for which he had the actual distribution. This 
is true over the greater part of the range, and the B-distribution appears to give 
a fair idea of the true values of the significance points. 

For instance, with » = 8 the distribution becomes 


1 
—%)2 
df = Ba.) x)? dx, 
and by direct integration the probability of a value greater than x in absolute 
value is 15 a) 
1 
1 3 (« 3 + =) (10) 


The chance of getting a value of S(d*) outside the range 14-154 is, as above, 
0-0107. The chance calculated from the formula (10), with a correction for 
continuity,* is 0-0098. 


Similarly, the chance of getting a value outside the range 26-142 is 0-0576. 
That given by the formula (10) is 0-0561. 


16. It may be expected that for larger values of n Pitman’s approximation 
is closer, and would probably provide a satisfactory test of significance for 
practical purposes. 

Furthermore, there is an extremely close relation between p and another 
measure of rank correlation suggested elsewhere (Kendall, 1938) the sampling 
distribution of which may be readily obtained. (This relation is discussed below 
in Part 3 of this paper.) For these reasons we have not thought it necessary to 
embark on the labour of determining H for values of n greater than 8. 

It is, perhaps, worth noting that the probability integral of the curves (9) 
may be related to “‘Student’s”’ t-integral by the transformation 

t= 
which gives, on substitution in (9), 


dt 


the “Student” form with n — 2 degrees of freedom. 
The deviate x corresponding to a value S of S(d?), with continuity corrections, 
is Ss 
It x is large the denominator term may be taken to be }(n3— 7), and to this 
approximation a= —p. 


* The continuity correction was made by assuming the range of the B-curve to be equivalent 
to the range —1 to }(n?—n)+1 for S(d*), i.e. the terminal frequencies were assumed distributed 
over a range of two units, one unit on each side of the terminal ordinates. 
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We thus reach the notable result that the approximate significance points 
of p may be determined from “‘Student’s”’ ¢-distribution with n—2 degrees of 
freedom by writing t = px(n—2)/V(1—p?). 


A transformation of the same kind may be used to test the significance of a 
value of the product-moment coefficient of correlation of a sample of n from an 
uncorrelated bivariate normal universe. The resemblance between such a coeffi- 
cient and p becomes even more striking when it is remembered that the former 
has the same variance as has p in the case under consideration. 


PART II. EXPERIMENTAL DISTRIBUTIONS OF p 


17. As an alternative to calculating Z for values of n greater than 8 we have 
conducted some experiments to find empirically the distribution of S(d?) for 
n=10 and n= 20. 

For the cases n= 10 and »=20 sets of permutations of the numbers 0 to 9 
and 1 to 20 were constructed from the tables of Tippett (1927) in the manner 
described below. There is reason to suppose that the coefficients or values of S(d?) 


calculated from these data are a random and representative selection from the 
possible values. 


METHOD OF OBTAINING DATA FOR PERMUTATIONS OF 10 


18. The 2000 permutations of the numbers 0 to 9 were obtained in the 
following way: the observer went through Tippett’s numbers (beginning on the 
first page and reading across), writing down the digits as they occurred but 
omitting those which had occurred already in the particular permutation he was 
constructing. When nine out of the possible ten had occurred the tenth was filled 
in without reference to the tables and the observer began on a new permutation. 
Thus, the first 51 of Tippett’s numbers are 


2952 6641 3992 9792 7979 5911 3170 5624 
4167 9524 1545 1396 720 


The first permutation will be 2 9 5 6 4 137 0 8, involving the first twenty-eight 
numbers. The last figure contributed from the table is 0, the 8 being filled in auto- 
matically; so beginning with the twenty-ninth figure, we find that the second 
permutationis5 624179308. 


19. The 2000 permutations were found to require 39,183 digits, an average 
of 19-59 per permutation, and hence cover practically the whole of Tippett’s 
table. We discuss the relationship between the expected and observed average 
run in the Appendix. So far as our tests show, the values of S(d*) obtained may 
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be regarded as reasonably random, and are representative of the theoretical 
distribution within sampling limits.* 


METHOD OF OBTAINING DATA FOR PERMUTATIONS OF 20 


20. For the permutations of 20 a rather different technique was adopted. 
Each pair of Tippett’s digits was taken to give a number from 1 to 20, numbers 
of 21 or greater being reduced by subtracting multiples of 20. Thus, the numbers 
21, 41, 61, 81 were taken as giving the number 1, the numbers 00, 40, 60, 80 as 
giving the number 20, and so on. This process can be carried out at sight, and, as 
for the permutations of 10, an observer went through Tippett’s tables reading 
out each pair so obtained. The first eight Tippett’s digits, 2952 6641, thus yield 
four numbers 9, 12, 6, 1. 

In order to eliminate errors, a second observer was provided with a working 
sheet of paper and a slip of cardboard on which were written the numbers 1, 2, ..., 
20 in their natural order. This was adjusted on the working sheet so as to lie a 
little higher up the sheet than the twenty spaces in which the random permutation 
was to be written, the number | lying above the first space and so on. The numbers 
read from Tippett’s tables were then numbered serially on the working sheet in 
the order in which they occurred. Thus, for the sequence 9, 12, 6, 1, the second 
observer would write the numbers 1, 2, 3, 4 on the working sheet at the places 
indicated by the figures 9, 12, 6, 1 on the cardboard slip. When the first observer 
read out a number which had occurred already in the permutation under con- 
struction, the second observer ignored it—and could do so without possibility 
of error because the space allotted to that number had already been filled. As 


before, when nineteen numbers had been obtained, the last was filled in auto- 
matically. 


21. The above process does not give the permutation as it occurs in the table, 
but a second permutation which has elsewhere been called the conjugate of the 
first (Kendall, 1938). Thus, to take a simple case, consider the order 

A 
B 


Rearrange B in the order 1, 2, 3, 4, 5, and rearrange A in the same manner, so 


that any A-number, which lies above a B-number in the above, continues to 
do so, thus A' 


If we repeat the process on A’ and B’ we get back to A and B. Band A’ may be 
called conjugate permutations. 


* The fact that the permutations emanate from Tippett’s numbers would no doubt be accepted 
by many as sufficient guarantee that the resulting values of S(d*) are a random sample. We our- 


selves felt that further tests were necessary, for reasons given at length elsewhere (Kendall & 
Babington Smith, 1938). 


> 
[ee 
| 
cx 
¥ 


M. G. KENDALL AND OTHERS 263 


It is easy to see that if a permutation occurs in Tippett’s tables as B the 
procedure described above will result in A’ being written down. 


22. If B is a random permutation A’ will also be a random permutation. 
Perhaps of more importance for present purposes is the fact that the coefficient 
p between the order 1, 2, ..., n and a permutation B is the same as between 
1, 2, ...,n and the conjugate permutation A’. For p depends only on the differences 
d, and these are the same in the case A, B as in the case A’, B’, though occurring 
ina different order. Hence, for the purposes of calculating, either the permutation 
or its conjugate may be used. The choice between them is entirely a matter of 
convenience, and, as has already been stated, we find that writing down the 
conjugate is simpler and far less liable to error. 

Considerations of space prevent us from giving these random permutations 
in full, but we should be glad to place them at the disposal of any workers who 
could find use for them. They can, of course, be used to construct permutations 
of objects fewer in number than 10 or 20 as the case may be, by the omission of 
certain numbers. 


DIstRIBUTION OF S(d?) IN THE RANKINGS OF 10 


23. The distribution of values of S(d?) in the 2000 rankings of 10 is given in 
Table ITT. 

As the frequencies in individual compartments are rather small we have 
grouped them in Table IV. 

It is evident at once that the distribution as judged by the first moment about 
the universe mean (S(d?)=165) is sufficiently symmetrical. In fact the first 
moment for the grouped distribution of Table IV is 0-18 (expected value zero). 
The variance of the theoretical distribution, from equation (3), is 3025, and hence 
the standard error of the mean of 2000 sets is 1-23. The observed deviation from 
expectation is thus well within sampling limits. 

The same is true of the second moment, the observed value for the grouped 
data of Table IV being 2980-7 (expected value 3025), deviation — 44-3. The 
standard error of the second moment = //,,/{(.— 1)/n} = 84 approximately. 


24. So far as these tests go, therefore, the distribution conforms to expec- 
tation. Notable features of the grouped distribution of Table IV are the anti- 
modes at S(d?) = !36—144 and S(d*) = 206-214. It would appear that for n= 10 
a certain amount of irregularity still persists and that the assumption of normality 
cannot be confidently made near the mean. More important from the sampling 
point of view is the behaviour of the distribution near the tails. Even for a sample 
as large as 2000, the frequencies occurring in the ends of the range are hardly 
big enough to allow a reliable comparison to be made with the theoretical 
frequencies given by the B-curves. Comparisons for some broad groupings, 
however, indicate a reasonable concordance. For example the B-curve for n= 10 
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TABLE III 
Distribution of 2000 values of S(d?) forn=10 
Frequency Frequency 
S(d?) S(d2) 
Ist 2nd Ist : 
thousand | thousand Total thousand | thousand Total 

16 ~ ~ 314 = 
24 = 306 2 2 
26 1 1 304 1 1 
28 1 1 302 
30 300 1 1 2 
34 296 1 2 
36 1 294 1 1 
38 l 292 es 1 
40 2 -~ 2 290 1 2 3 
42 ‘nim 2 2 288 2 Sek 2 
44 1 2 3 286 1 sine 1 
46 l 2 3 284 
48 3 3 6 282 6 2 8 
50 2 — 2 280 1 2 3 
52 3 3 6 78 2 1 3 
54 1 2 3 276 1 2 3 
56 1 3 4 274 2 3 5 
58 2 1 3 272 4 2 6 
60 3 2 5 270 5 3 8 
62 2 1 3 268 1 4 5 
64 6 4 10 266 4 9 13 
66 3 3 6 264 1 5 6 
68 4 5 9 262 1 4 5 
70 2 4 6 260 8 3 ll 
72 2 2 4 258 3 3 6 
74 4 3 7 256 4 4 8 
76 6 3 9 254 4 7 ll 
78 9 4 13 252 5 7 12 
80 9 5 14 250 8 7 15 
82 5 5 10 248 5 3 £ 
84 10 4 14 246 3 7 10 
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TABLE 
Continued 
Frequency Frequency 
1 2nd 1 2nd 
ist st 
thousand | thousand Total thousand | thousand Total 
86 3 7 10 244 3 4 7 
88 7 5 12 242 7 3 10 
90 10 5 15 240 5 2 7 
92 4 8 12 238 6 5 ll 
94 7 7 14 236 7 2 9 
96 4 7 11 234 13 5 18 
98 a4 7 18 232 9 5 14 
100 10 | 4 14 230 5 13 18 
| 12 | Mm | 3B 27 228 9 7 16 
9 20 226 8 9 17 
15 224 8 7 15 
108 | 9 ll 20 299 il 9 20 
10 10 23 220 9 1l 20 
112 8 i8 26 218 12 9 21 
I 114 11 9 20 216 13 13 26 
116 | om 28 214 8 4 12 
118 1 | 15 26 212 6 10 16 
| 120 12 12 24 210 13 14 27 
122 8 | 5 13 208 12 6 18 
| 494 19 206 7 12 19 
126 9 | 8 21 204 8 16 24 
128 22 | 26 202 14 8 22 
130 7 17 24 200 12 12 24 
132 “SARE fies 18 198 9 16 25 
134 28 196 ll 14 25 
136 6 10 16 194 13 14 27 
138 9 | 8 17 192 12 12 24 
140 1 | 10 20 190 13 13 26 
| 142 ae ee 24 188 16 16 32 
144 5 14 19 186 12 18 30 
146 > ae 31 184 12 14 26 
| 148 iS 24 182 13 ll 24 
150 a | 2 24 180 15 li 26 
152 18 178 36 
154 4 | 28 176 22 
| 156 14 7 21 174 6 | 16 31 
158 20 8 28 172 ll 18 29 
160 12 6 18 170 aie ee 27 
| 162 10 14 24 168 14 - 10 24 
164 17 21 38 166 10 20 30 

| Totals 502 480 982 Totals | 498 | 520 1018 
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TABLE IV TABLE V 
Distribution of the 2000 sets of Table IIT, Distribution of 400 values 
condensed of S(d*) for n= 20 
] 
S(d2) S(d2) Frequency 
400-498 1 
O- 4 ~ 326-330 — 500- 0 
6- 14 316-324 600- 5 
24 306-314 3 700- 10 
26- 34 2 296-304 | 5 800- 22 
36- 44 9 286-294 | 8 900- 22 
46— 54 20 276-284 1000- 40 | 
56- 64 25 266-274 37 1100- 44 | 
66- 74 32 256-264 36 1200- 51 
76- 84 60 246-254 56 | —-1300- 56 
94 63 236-244 44 |  1400- 
96-104 90 226-234 83 | 35 
106-114 104 216-224 | 102 | 1600 33 
116-124 110 206-214 92 | -1700- 17 
126-134 117 196-204 120 | | 8 
136-144 96 186-194 139 | | 8 
146-154 125 176-184 134 | | 2000- 4 
156-164 129 166-174 | 141 | ‘| ~~ 2100 2 
| | 
| 
Total 982 Total | lols | | Total | 400 
| | | 
— 


gives a chance of 0-9891 that a value of S(d*) will fall inside the range 38-292. 
The expected frequency outside this range in 2000 rankings is therefore 22, with 
a standard error of approximately ,/22. The observed frequency is 14. Similarly, 
for the 5 % level, the chance of a value falling inside the range 60-270 is 0-9474 
and the expected frequency is thus 105; the observed frequency is 96. 

Fig. 2 gives the histogram of the data of Table IV with the curve 

y = k(1—2*)8 

of equal range and equal area. So far as the eye can judge the correspondence is 
reasonably good. 


DISTRIBUTION OF S(d?) IN THE RANKINGS OF 20 


25. The distribution of S(d*) in the 400 rankings of 20 is given in a grouped 
form in Table V. The alternation of modes and antimodes has now disappeared, 
though it might emerge with finer grouping. 

The mean value of S(d*) (about origin 1330), as grouped in Table V, is — 18-5, 
the expected value being zero. The variance of the theoretical distribution is 
93,100, so that the standard error of the mean of 400 sets is 15-2. Again the 
observed deviation is well within sampling limits. 

The same is true of the variance, the deviation of the observed from the 
theoretical value being 714 with a standard error of 6193 approximately. 
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26. The experimental evidence, so far as it goes, confirms the theory. It would 
appear that for values of n equal to 10 or less the distribution of p cannot be taken 
to be normal to a satisfactory degree of approximation. The Type II B-curves 
proposed by Pitman are better, but they are possibly inadequate to represent 
frequencies in narrow ranges. They do, however, appear to be sufficient to deter- 
mine significance points. 
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Fig. 2. Histogram of the data of Table IV, together with the curve 
y= k(1—x*)§ of equal area and corresponding range 


PART III. RELATIONSHIP BETWEEN SPEARMAN’S COEFFICIENT 
AND ANOTHER COEFFICIENT OF RANK CORRELATION 

27. One of us (Kendall, 1938) has suggested a measure of rank correlation 
whose sampling distribution can be obtained without much difficulty. For prac- 
tical purposes the coefficient, denoted by 7, is most easily calculated as follows: 

Let X,, X,, ..., X,, be a permutation of the first » natural numbers. Suppose 
there are, to the right of X,, k, numbers greater than X,, to the right of X,, k, 
numbers greater than X,, and so on. If 


(11) 
the coefficient of rank correlation between the order X and the natural order 
1, 2, ..., m is defined as 2s 


T= n(n—1)° 
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In the paper under reference it was shown that 7 can vary from — 1 to + 1, has 
variance equal to 2(2n + 5)/9n(n — 1) in the universe in which all rankings appear 
equally frequently, and is normally distributed for large n. A method of obtaining 
the distribution for small » was given together with the actual distribution for 
values of x equal to 10 or less. 


28. Different as p and 7 might be expected to be from consideration of their 
methods of calculation, they were frequently found in practice to give numerical 
values which are remarkably close, even for low values of n. It therefore seemed 
worth while to investigate the relationship between them. 

Each of the n! permutations of the first » natural numbers will, in relation to 
the order 1, 2, ..., m, give a pair of values of p and rT. The ideal would be to find the 
bivariate frequency table into which these values fall when arranged according 
to the values of p and 7 (or, more conveniently, of S(d?) and 2). Such a distribution 
must necessarily be extremely complicated when expressed in general terms, for 
even one of its border frequencies is the complex distribution of p (or S(d?)). 

It appears, however, to be possible to find a comparatively simple expression 
for the product-moment coefficient of correlation in such a table. This coefficient, 
denoted by r,, or rgy as the case may be, gives a reliable measure of the corre- 
spondence between p and 7 inasmuch as the distribution of each is single-humped 
and tends to normality as n becomes larger. 


29. By actually constructing the bivariate table for values of x from 2 to 6 
inclusive we have found that, for such values, 


2(n +1) 

Yor = VQ2n(2n + 5)} (13) 
with a corresponding value for the covariance of S and 2, 

= + 1)? (n—1). 

The actual correlation table for n=6 is given in Table VI. The close relation 
between the variates is immediately evident, and it is of some interest to note that 
the regression is not quite linear. Presumably, however, it approaches linearity 
as n becomes larger. Both variates tend to normality and though this in itself is 
insufficient to guarantee linearity of regression, the fact that r,, tends to unity 


makes it very probable that the joint distribution tends to the bivariate normal 
surface. 


30. We have not succeeded in finding a rigid proof that equation (14) is true 
for all n. The following line of argument, however, appears to make it highly 
probable that (13) and (14) are of general application. 

/4y,7!, the product sum of 2’ and S(d?) (the latter measured from its mean), 
is clearly an integer and is a function of n only; for when n is fixed it is completely 
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determined. The analogous quantities .)”! (the sum of squares of S) and po.n! 
(the sum of squares of 2) are respectively 
n(n+1)?(n—1) n(n—1)(2n+5) 
36 n! and 18 n!. 
One suspects therefore that 4,7”! is equal to f(n)”!, where f(n) is a polynomial 
in n; in other words, that My, = f(n). 


If this is so, f(m) cannot be of higher degree than four, for the product of 4) and 
Mog is of degree 8 and otherwise r would be greater than unity for some large n. 
Hence if a polynomial of degree four or less can be found which takes the 
observed values of ,, for five cases, that polynomial is equal to ,,. Equation 
(14) satisfies the condition and thus is true in general. 
Tn actual fact (14) is also satisfied in the degenerate case n= 1, but (13) is not 
owing to the omission of two factors which cancel for » > 1 but are zero for n= 1. 


31. If formula (13) is in fact true, the following are the values of r corre- 
sponding to some values of n: 


n Tor 
5 0-980 
10 0-984 
15 0-988 
20 0-990 


To verify the result for n = 10 we found the coefficient of correlation between 
the values of p and 7 for 1000 of the experimental permutations. This value was 
0-980. 

It would seem worthy of serious consideration, therefore, whether the 
coefficient p might not be replaced by 7, in the sampling distribution of which 
there is no uncertainty. 

SUMMARY 


1. An expression is given for the sampling distribution of S(d*) in the universe 
in which all rankings appear equally frequently, where the Spearman coefficient 


of rank correlation is 6S(d?) 


ni —n° 

2. The distribution is given explicitly for values up to and including n=8. 

3. It is suggested that for values of n less than 10 (and possibly higher) the 
distribution is inadequately represented by the normal curve but that a B-curve 
is sufficient to determine approximate significance points for values of n greater 
than 7. 

4. Some experimental distributions for n=10 and n=20 are given and 
discussed. So far as they go these distributions support the theory. 

5. A discussion is given of the relationship between p and a coefficient of rank 
correlation 7 suggested elsewhere. The correlation between the two appears to 
be extremely high and in view of the fact that the sampling distribution of 7 may 
be easily obtained it is suggested that 7 may be of greater practical value than p. 
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APPENDIX 
The Randomness of the Experimental Samples 

1. In testing the agreement between theory and the experimental data from 
Tippett’s table we used some results obtained as follows: 

Given a random series of n different objects, the average length of run required 
to reach one of P( <n) stated objects is n/P. 

For, if a start be made at any point in the series the chance that the first 
object is one of the P is P/n, say p. The chance that the first is not one of p but 


the second is so, is (1 — p) p. The chance that the first (r— 1) are not members of P 
and that the rth is so, is (1—p)"-"p. 


The total chance of obtaining one of P is 


as it should. 


The average length of run is 

[1-(l-p)P 
In other words, if there are at any stage P objects left to find to complete any 
given set, the average length of run required is n/P. Moreover the occurrence 
of each object is independent of that of the others. Hence the average length of 

run required to give (n—1) of the n objects composing the series is 


| +545 | (a) 

In a similar way it will be seen that the variance (“.) of runs required to give 
one of P objects is given by 


the result stated. 


1 4 
+— = [1+ 
[ (l—p)+3°(1—p) ] 
so that 


18-2 
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Since the runs are independent the variance of the run required to give (n — 1) 
objects is 1 1 


2. For the case n= 10, formulae (a) and (b) give average run = 19-29, variance 
= 35-69. For 2000 sets the ga average value of the run is therefore 19-29 
with a standard error of | 00 2000 °F 0-134. 

The observed value was 19-59, which exceeds the expected value by about 
2-2 times the standard error. 

This is rather too large for comfort. Possible sources of the difference are 
(a) the non-randomness of Tippett’s numbers taken as a whole, (6) errors in 
writing down the permutations. 

It appears that errors of type (b) would tend on the whole to exaggerate the 
length of run required since it is easier to overlook digits in the tables than to 
imagine non-existent digits. Such errors, however, unless they are systematically 
concerned with certain digits, which we regard as unlikely, will not affect the 
randomness of the permutations. Nevertheless, we thought it wise to eliminate 
this possible source of error in taking the sets of 20, and the method to this end 
has been described in the foregoing paper. 


3. An internal test on the permutations of 10 themselves revealed no signi- 
ficant divergence from expectation. In one such test the numbers 1, 2, 3 were 


extracted from each permutation and their order noted. The results for the first 
1920 permutations were: 


Permutation 
123 302 
132 296 
213 339 
231 327 
312 323 
321 333 
Total 1920 


The expected frequency in each class is 320, y?= 4-65, P = 0-46 approx. 


Moreover, as has been pointed out in the text, the resulting distribution of 
S(d*) conforms to expectation in its mean and variance. 


4. Applying equations (a) and (b) when n= 20 we find 
average run = 103-91 digits, 
variance = 746-04 (digits)?. 

The standard error of 400 sets is therefore 1-37. 
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The observed average run was 106-75, in excess of the theoretical run by 2-07 
times the standard error. ; 

This result confirms our suspicion that Tippett’s numbers, taken as a whole, 
are not quite a suitable random set. 

Nevertheless, the resulting distribution of S(d*) conforms to expectation in 
mean and variance. 


5. To sum up, we are inclined to suspect that Tippett’s numbers may give 
results not in accordance with expectation when the whole table is used. The 
difference, however, is not greatly beyond permissible sampling limits. Moreover 
the non-randomness of Tippett’s table, even if it exists, need not necessarily 
affect the randomness of the permutations obtained from it or of the calculated 
value of S(d?), and internal tests suggest that, in fact, it has not done so. We feel, 
therefore, that the sample of values of S(d*) may be regarded as random with 
some considerable confidence. 
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1. INTRODUCTION 


In the study of distribution laws in statistics the two most important methods 
are those depending upon 

(a) The characteristic function; 

(6) Transformation of variables. 

Sometimes both these methods lead to certain definite integrals which are 
not capable of being expressed in terms of simple functions. In such cases it is 
a common practice to approximate the distribution by means of the very well- 
known system of curves due to Kar] Pearson. 

In the present paper a method of deriving distribution laws from a slightly 
different point of view is developed. Certain theorems regarding this method 
are proved in §2, and the remaining sections are devoted to the application of 
these theorems to derive the distribution of several criteria that arise in the 
Theory of Sampling. 

The illustrations taken have been partly studied by S. 8. Wilks (1932), who. 
expresses the distribution laws as multiple integrals which may readily be 
evaluated in certain simple cases. The present method however gives the result 
as a single integral whose properties, from the mathematical point of view, may 
be studied by means of a differential equation that it satisfies. 


* The present paper is a modification of one of the papers submitted by the author for the 
Ph.D. Degree in Statistics of the University of London (1937). 
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The distribution laws derived in the paper may appear as if they are pure 
mathematical functions which cannot with advantage be handled by the practical 
statistician. No doubt the distribution laws take a very complicated form, but the 
author has taken the formula of § 3 and shown how this may conveniently be used 
to calculate the levels of significance of the L, criterion. By this method he has 
been able to check the substantial accuracy of the 5 % and 1 % significance levels 
of the L, criterion obtained and tabled by P. P. N. Nayer (1936) by an approximate 
method. A further paper setting out these results and discussing their bearing 
on the accuracy of an analogous test suggested by M. S. Bartlett (19374, 6) will 
be published shortly. The other distribution laws may be utilized on similar lines 
to yield useful information, which otherwise would be lacking. 


2. CERTAIN THEOREMS REGARDING DISTRIBUTION FUNCTIONS 


(1) It is proposed to develop in this section a few theorems yielding distribu- 
tion functions. The method is based on the theory of Fourier’s transform, and the 
formula developed is due to Mellin. 

To avoid repetition we shall adopt the following notation: 

(@) %,...,%, are variates continuous in the interval a;<2;<b; (i=1, 2, 
...,”); a, and 6; may be finite or infinite. 

(b) p(%,,Xg, ...,%,) is the probability law of the 2’s so that 


by bn 
an 


(c) 0,49, ...,4, are non-negative functions of the x’s. Any one of these will 
be denoted by @. 


(2) 1. If 


bs bn 
an 
1 fio 
then = | (3) 


provided the integrals in (2) and (3) exist.* 
Proof. To prove this theorem we note that if u = log 6, 


p(@) = (4) 
ay bn 
an 


* It will be seen that for convergence of the integrals in (2) and (3), it is enough that ¢(¢) 
belongs to Z,. The inversion formula (3) holds true even if ¢(t) belongs to L. 
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Hence, using Fourier’s theorem, 


p(u) = (6) 


Changing wu to @ and it to t, we get 


ico 
THEOREM 2. If 
b, 

(ts te) = | | (8) 

2 +iwo fio 


provided the integrals in (8) and (9) exist. 


This theorem may be proved by the same method adopted for theorem 1, 
but we give below a slightly different proof. 


Proof. Let p(@. | denote the probability law of 6, given 0,, so that 


P(9,,9) = (10) 
Ai (Bs 
Now = 04) 0,00, 
a 
é 
where g(ts) -| (13) 


In the integrals in (11) «, and /,, «, and /, are the limits of 0, and 6,, and these 
may be finite or infinite. y and d in (13) may depend on 6,. 


Applying the inversion formula of theorem 1 to (12) and (13), we obtain 
l ico 


From (14) and (15) we get 


1 


—] fio ico 
‘47? P(t, ty) dt, dtg. (16) 


Hence the theorem follows from (10). 
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THEOREM 3. If 
Y(t, ty) | | P(%, Ln) dx,dx, dz,,, 
a, 
(17) 
1 
P(9,, 93) = eth at] | te) at, | (18) 


provided the integrals (17) and (18) exist. 


The proof of this theorem is obvious. It is also clear that the theorems 2 and 3 
may be extended to give the simultaneous distribution functions of any number 
of variates. 

It will be observed that ¢(¢) in theorem | gives the mathemaiical expectation 
of the ith power of 6, and hence for positive integral values of t, ¢(t) is the tth 
moment of 6. But the integral (2) defines ¢(é) for all values of ¢ for which the 
integral exists. Hence it is proposed to call ¢(¢) the moment function of 6. 


(3) The integral (3) may conveniently be evaluated by the method of contour 
integration. But in certain cases it is found that a very easy method is afforded 
by considering it as the solution of a differential equation. This method is given 
below.* 


THEOREM 4. Suppose ¢(t) in theorem 1 satisfies the following conditions : 
(a) The singularities of $(t) are all on the negative axis of t. 
(6) There is a positive number a such that 


A(t 
d(t+a) = (19) 
A(t) and B(t) being polynomials in t. 
Under these conditions p(@) defined by (3) is a solution of the differential equation 


dé do 


Proof. Since the singularities of ¢(#) are on the negative axis of t, we may 
move the path of integration in (3) to (a— 00, a + ico) without changing the value 
of the integral. Thus the equation (3) reduces to 


2mi 
Replacing ¢ by ¢+a and using (19), we have 


1 ie 


(20) 


p(A) = 


a! 
= 0 t “By (22) 


* I am grateful to Dr G. Rasch for pointing out this method in the study of distribution 
functions. 
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Now we use the following symbolic equations:* 


(0.55) = on (055 + n) (24) 
From (23), 6-1 A(t) = 4(- (25) 


Substituting the result of (25) in (22) and assuming that the symbolic operator 
can be taken out of the integral sign, we have 


Performing the operation Bi - os i) on both sides of (26) and after simpli- 


fication with the help of (24), we get 


= A (- 055-1) P00), (27) 


which proves that p(@) satisfies (20). 

The solution of the differential equation (20) will have a certain number of 
constants equal to the order of the equation. By a proper choice of these constants 
the solution may be made identical to p(@). This may be done by equating the 
residue of the integrand in (3) at any of its poles to the corresponding terms 
in the solution of (20). 

The method of the differential equation may generally be employed when 
is of the form +1) T(b,) 


where a; is greater than zero for i = 1, 2,...,; for, in this case, the singularities 
of #(t) are the same as those of ’(a;+?) and these are at the points ¢ = —j—a,; 
where j is zero or any positive integer. Also 


(a; +t) 
1(6;+#)° 


Thus we have here a method of studying a certain type of integral equation 
that has been called by 8. 8. Wilks (1932, p. 474) a type B integral equation. 


P(t +1) = A(t) 


* See, for example, A. R. Forsyth (1921), theorems 1 and 2, p. 61. 
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3. THE SAMPLING DISTRIBUTION OF THE NEYMAN-PEARSON L, CRITERION 
IN THE CASE OF k SAMPLES OF EQUAL SIZE 


(1) Let s,, 89, ...,8,* be the sample s:andard deviations from k normal popula- 
tions with the same standard deviation. Then ZL, is defined by Neyman & 
Pearson (1931) as (e382 ... 92) 


(29) 
k 


If d(¢) is the moment function of L,, 


¢(t) = 82) ds? ... ds? 
0 0 


m+ 


(30) 
Now, we apoly the inversion formula of theorem 1 and get 
n—1 n—1 
pe 
—io 
r ("+ ) 
To evaluate this integral, replace, by Then 
n—1 
L(t) 


2 


Clearly the poles of the integrand are the same as those of J*(t/k) and, except 
for the pole at the origin, these lie on the negative axis of ¢. The path of integration 
may therefore be changed to c—ioo, c+it00, where c is any positive number. 


n—1 
(: 
where F(Z) = (34) 
re 
(>) 
l ctio + T(t/k) 
F, L = dt. 35 
2m k T(t) (35) 
* If x,, Xq, ..., %, are the sample values, s* is the mean value of (x;—Z)*, = being the arithmetic 


mean of the z’s. 
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Thus p(Z,) splits into the product of two factors, one of which is independent of 
the size of the sample. 
T*(t/k) 

I(t) 


Putting L,/k equal to x and noting that, if y(t) = 


tk-1 
we get from theorem 4 the following differential equation satisfied by F,(L,): 


(ak) 1) 2) +k- 1)z («x)= (37) 


(2) To solve equation (37), assume 


x(t+k) = 


z2= 


(38) 
=0 
Substituting in (37) 
kk 
i=0 
= (39) 
i=0 

Equating to zero the lowest power of x, the indicial equation is 

(40) 


which gives p = 0 as a (k—1) multiple root. Further, the coefficients satisfy the 
recurrence formula 


with a; = 0 fori = 1, 2,...,(k—1). 
Thus the series (38) contains only terms of the type x*‘ and we may write 


(42) 
i=0 
I'(p+ik+1) ‘ 
I'(p +1) [(e +k) (p+ 2k) ... (o 


To get the complete solution, since the indicial equation has p = 0 as a (k—1) 
multiple root, we use the method of Frobenius and obtain 


where A,(p) = 


/9\h 
n=0 p=0 
where C,, (h = 0,1, ...,&—2) are constants. 
It may easily be proved that the series (42) is uniformly convergent in the 
interval 0<p and 0<ak<1. Hence differentiation term by term of this series 
s valid. 
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To evaluate the hth derivative of A,(p) x**+? we write 


(45) 
Using the formula 
x 
log = log Ia) + + (46) 
where y(a) = — 
(47) 
? 
the exponent in (45) may be expanded and we get 
where 1 1 
= 
...(49) 
t) = 1) — Wj_1( 1 +(-1) (1454-45) 
H A; = D ik 50)* 
where 
b,(i) 0 0 
b,(2) b,(%) 0 
D,(i)=| bg (4) 2b,(i) b, (i) (51) 
(") denoting the binomial coefficient "C,. Hence 
To evaluate the constants C;, in (52) so that z becomes identical with F,(Z,), 
T*(t/k) 


we find the residue of a— 


To at ¢ = 0 and equate this to the corresponding 


t in (52). N : ¢ 
erms in (52). Now t/k) 


* See Appendix. 
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Hence at ¢ = 0 the pole is of order k—1 and the residue is the coefficient of ¢*-2 
in the expansion of 


T*(t/k +1) 
which is equal to 
—logx -1 0 0 
dy —logx -1 0 
d, 2d, —logx ... 0 
k-3 k-3 
(54) 
1—ki- 
where d; => (j=2, 3, 
The corresponding terms in (52) are clearly 
(55) 
h=0 
Since b,(0) =logx and b,(0) = 6,(0) =...= 0, 
(55) reduces to 
k-2 
D,(0) = Cy log x + C,(log x)? + (56) 
h=0 


Equating (56) to (54) and putting loga = —6, the constants are given by the 
equation 


dy 0 
_ | dy 2dy nt 
k-3 k-3 

(57) 
Hence C+ D,(i) + + Cy Dy(i) ... + Cp 
ke 

= k), (58) 
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where d,(t, k) -1 0 0 
d,(i, k) <4 
=| k) dy (i, k) -i 


the elements of the determinant being defined by the following equations: 
d,(t, k) = 6,(%), 


d,(i, k) = 6,(i)+(- (1-5) ,(1). 
Thus F(L,) = = (61) 
Finally, from equations (34), (35) and (61), 


© 


p(Ly) = 


2 
When k = 2, the differential equation reduces to 


dz 
= 
4xz = 0. 
This is easily integrated and we get 


(63) 
The constant C is given by 
= 
C = residue of x atti =0 
= 4, 
Hence in this special case 
I'(n-1) LP? 
p(y) (64) 


a result which has been proved by P. P. N. Nayer (1936, p. 43). 


4, THE SAMPLING DISTRIBUTION OF THE L, CRITERION APPROPRIATE TO 
k SAMPLES OF EQUAL SIZE FROM BI-VARIATE NORMAL POPULATIONS 


(1) In this section we shall consider the sampling distribution of the likelihood 
criterion appropriate for k samples from bi-variate normal populations as 
developed by S. 8. Wilks (1932, pp. 489-90). We shall denote this criterion by A,, 
instead of using Wilks’ notation of Ai»), » being the number of variates. It will 

* See Appendix. 
+ The application of this series to numerical computation will be dealt with in a separate paper. 
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be observed that ir. the general case thi: gives a single criterion to test the hypo- 
thesis that the variances and the co-variances in a set of k multivariate normal 
populations are equal. 

Consider k p-variate normal populations.* Let 

(1) 24, %g,...,%, be the p variates. 

(2) 4, Mg, the size of the k samples = N). 

(3) 2,,, the value of the ith variate for the ath individual in the bth sample. 

(4) %,, the mean of the ith variate in the bth sample. 


1 ™ 
(5) 8: = (Liab — Xin) — 
ba=1 


1 


(7) v, the generalized variance for the sample ), i.e. the determinant | s, 5, |. 
(8) v the generalized variance for the sample obtained by pooling together 


all the & samples, i.e. v = | ¢;; |. 


k bn 
With this notation A, = (65) 
b=1 
(2) The tth moment of A, is given by Wilks (1932, p. 490) as 
N-—k+1-%) 
2 2 
(66) 
When the n’s are equal, (66) reduces to 
t) = ‘ 
2 2 2 
(67) 
Applying the inversion formula of theorem 1 to (67), we gett 
2 2 
(68) 


* T have followed Wilks’ notation in using the letter » for the number of variates. This must 
not be confused with the p used to denote a probability function. 

By applying Stirling’s formula for I(x), we may prove that A(t)~ and hence the 
integral (68) exists. 
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Putting ¢ = Le (68) becomes 


dé. 


2 
(69) 


Changing A, to LZ, = Ai" and using the relation 
P(L,) = p(A,) nk, 


we get 
k(n—1)—j+1 
2 2 2 
(70) 
Now putting = we get 
(71) 
r k(n—1)—j+1 
p < 2 
where F\(L,) = 1 (72) 


2 


p-j 
t 


(73) 


Thus p(Z,) splits intc the product of two factors one of which is independent 

of the size of the sample. This result may be compared with that of equation (33). 
To evaluate the integral for F,(L,), we might apply the method of theorem 4; 
this however is not particularly easy. We shall therefore apply the direct method 
of contour integration. It is evident that the poles of the integrand in (73) are 
given by oe 
fors=0,1,2,...,00; j= 1,2,3,...,p. (74) 


The occurrence of the double suffix s and j makes the actual expression for the 

residue highly involved. So we shall limit the discussion to the case p = 2 and 

show how the expression for the integral may be reduced to a manageable form. 
When p = 2 we get, after simplification, with the help of the formula 


Biometrika xxx 


| 
1 
| 


286 Distribution Laws in Statistics 
I(kn—1—1) 


where F,(L,) = (77) 
F,(L,) = (z) (78) 
In F,(L,) the poles of the integrand are at ¢ = —ks for s = 0,1, 2,...,00. To 
evaluate the residue at —sk, we put t = —sk+0, so that the integrand may be 
reduced to ks—k+1 
j= 
Thus for s greater than or equal to unity, the pole is of order k— 1 and the residue 
R, is given by kk (ks—k+1)! 
ks 
where A,(s) 0 0 
A,(s) 2A,(s) A,(s) -l1 0 
D(s,k)= | A,(s) 3A,(s) 3A,(s) A,(s) -1... 0 
—3 | 
A;,_9(8) Jay 3(8) eee A,(s) | 
(81) 
the elements of the determinant being given by the following equations: 
ks—k+1 
Aje) 
j= 
A 1 ks— 3 
for 4 = 2,3,...,and = 1,2,3,...,@ 
At s = 0, the pole is of order & and the corresponding residue is 
kk 
Ry = (k—2)! (k— 1)! D(0, k), (83) 


where D(0, k) is a determinant similar to the one in equation (81) having (k— 1) 
rows, with A,(0) for A,(s), where 


k-2 
A,0) = logk-logL,-— > | 
j=1) 
k-2 ] 
ki- jas” 
(84) 


* This is obtained by a method similar to that for (54). 
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kk [TD(0,k) © (ks—k+1)! 


Hence F,(L,) = pit (- 1)*-2 k) us|. 


s= 


Finally, using equations (76), (77) and (85) we get 
_ _(kn-k-2)! 
p(L,) {(n i= 3) (k 2) ! 
D(0, k) 1)k-2 
When k = 2, the equation (86) takes a simple form: 
2 2 (2-1)! 
The above series may be simplified by integrating both sides of the identity 


_ 


(ks—k+1)! D(s, k) 


x s! 28 
© (2s—1 3)! pase 
Thus | —————— da 
1 
= -{ , Where 1—2? = y? 
va-ayl+y 


= log {1 +./(1—2?)} —log 2. 
It follows therefore that for k = 2 


(2n — 4)! 2n— log Li) 


a result given by Pearson & Wilks (1933, p. 367). 
5. THE INDEPENDENCE OF THE ARITHMETIC MEAN AND THE RATIO OF THE 


GEOMETRIC TO THE ARITHMETIC MEAN OF SAMPLES DRAWN FROM A PEARSON 
TYPE III PpoPpULATION 


Let x,, %g, ...,#,, be a sample from the population defined by 
Let 
g = (22%, ...2,,)", 
(90) 
We shall prove that L) = (91) 


We start with the simultaneous probability law p(%,g) and by transforming 
the variables to % and L, we get 


6) 
7) 
? ea 
| 
| ae 
| 
| 
| 
| 
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To find p(%,g) consider the function 


7) = | 


0 0 


0 0 


x ei dar, day... 


1 : 
gat Tin—-1 e—x(l+it/n) 
E (q) 0 


_ Tn) ( n 


Now we apply theorem 3 and we get 


io +00 


ng l ia l 


(94) 
Since = for b positive and Riz) >0* 
ince Te or 6 positive and R(x) > 
T 


1) oa From equations (92) and (95) we get 
p(%, L) = p(L). 
6. S. S. WiLks’ B INTEGRAL EQUATIONS AND THE SAMPLING 
DISTRIBUTION OF CERTAIN CRITERIA DISCUSSED BY HIM 


(1) The type B integral equation as defined by Wilks is given by 


I(a;) I'(t+,) 
f(z) dx = = 96 
f(z) dx = Ti+a,) (96) 
where a;>6;>0. Let us assume that 2(a;—6;) > 1. With this condition it will be 
seen that ¢(t) belongs to L,. Hence we may apply theorem | and write } 
ico 


* N. Nielsen (1906), p. 155. 
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Hence all the conditions of theorem 4 are satisfied and f(z) satisfies the differential 
equation 


j=1 

As applications of this principle of solving integral equations, we give below 
the distribution of the several criteria discussed by Wilks. The notation is the 
same as that of Wilks and no attempt at defining the criteria is made. 


(2) The generalized correlation ratio, U. 
The tth moment of U is given by Wilks (1932, p. 484) as 


mr? 
Hence the distribution of U is given by 
p(U) = TI (101) 
j=1 2mi J 2) 
2 2 
Replacing ¢+ 4(p—n) by ¢ and putting N —p = m, 
U n r( 2 
p—n)— 
0) = 
2 
P(t+})... r(t+">) 
2 
m m+1 m+n—1 
2 2 2 
(102) 
Denoting the integral on the right side by F(U), F(U) satisfies the differential 
equation 
d d ad m+n-3 
(v du. 2 \(lap 2 )...(0 2 )y 
d d ') 
= zo(" ap 3)(v dU ) (v dU (103) 
When » = 1, (103) reduces to 
dy 
qo 5 (104) 
which gives (105) 


* See footnote to p. 284 above regarding use of the letter p. 
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I(t) 


To find C we compare the residue of U-* — 
r(t+3) 


at ¢ = 0 with the corresponding 


term in (105). Thus C = Hence 


T(m/2)’ 


r(=>) 


p—1\,(N-p 


This result has been given by Wilks. 
When v = 2, (103) reduces to 


(106) 


d* d m—1)(m—2 
from which as in the previous case we get 
--2 : 


Wilks gives this in the form of a hypergeometric series. For n > 2, F,(U) cannot 


be expressed in any simple form, but its value may be obtained as a series. 
(3) Generalization of 1—?, W. 
The tth moment of W is given by Wilks (1932, p. 486) as 


N-j N-p-j+l 


2 


Since (109) may be obtained from (100) by replacing p by N — p+ 1 in the latter, 


the distribution of W may be readily inferred from that of U. 
(4) Generalization of ‘‘ Student’s”’ ratio, Y. 
The tth moment of Y (Wilks, 1932, p. 488) is 


(3) r(t+"5") 


N OF 
Hence = (3) : ‘ie ak, (111) 


r(=;*) 


ven = 
: 
vine 
q 
‘ 
are 
= 
p(U 
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Replacing ¢ + 3(N by 


r N —n J n 
+3 
It may be proved that the integral in (112) satisfies the differential equation 
d mn dz 


Solving this and finding the constants so as to make the solution identical to 


p(Y), we get IN 
(3) 


2 
This result is given by Wilks. 
(5) Ratios of determinants of correlation coefficients, w. 
The tth moment of w is given by Wilks (1932, p. 491) as 1 
j=2 ~ 
*) II + t) 
Hence p(w) = 5 - dt. 
N — N 1 
: i) Mt +t) 
=2 - 
Putting ¢ instead of t+ 4(N —n) we get 
n 
( 2 l io+}(N—n) 2 +t) 
p(w) = /N-j\ win —n)-1 wt dt. 
— —io+iN—n) rea 
ius") 
(117) 


The integral in (117) satisfies the differential equation 


N 
Hey 
| 
| 

| 
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For n = 2, 3 this equation is readily solved, giving respectively 


r 


Equation (120) may be compared with the one given by Wilks in the paper cited 
(1932, p. 492, 49). 


7. CONCLUSION 
The main object of the paper has been to develop the study of distribution 
laws of statistics whose moment function can be evaluated. The method gives 
an elegant mathematical solution to Wilks’ type B integral equation. A detailed 
study of these functions from the mathematical point of view is made possible 
since the differential equation that they satisfy can readily be written down. 


In conclusion, the author wishes to thank Professor E. S. Pearson under whose 
suggestion and guidance the present paper was written. The author is also 
indebted to Dr G. Rasch for considerable help he has had in developing the ideas 


of §2, and to Dr R. C. Geary for a number of suggestions which have improved 
the final form of the paper. 


8. APPENDIX. DETERMINANTS ARISING OUT OF THE SUCCESSIVE 
DERIVATIVES OF EXPONENTIAL FUNCTIONS 


(1) Let (a;) be a sequence of numbers, finite or infinite, and let 


e 


F(a; t; dg, ...) =e (1) 
Then the nth derivative of F(x; t; ag, a3, ...) at t = Ois 
—1 0 0 0 0 
ay a 0 0 
D,(«,a) =} a, 3a, 3a, x 0 |. 


| 
| 
4 
4 
4 
4 
2 I 
n—1 
n n—1 n—-2 eee eee see 
ok 
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To prove this, we differentiate both sides of (1), so that 


2 
F,(2; t; a) = t; a) | 


Using Leibnitz’ rule on successive derivatives to (3) 


a 


n—1 | 
F,(@; = (" . ) Py t; @) ...]. 
i=0 
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Putting ¢ = 0 in (4) and giving a the values 1, 2, 3,...,2 and eliminating 


F,, F,, ... , F,_1, we get (2). 
(2) If co, c,, ...,¢, are constants such that 


= D,(2,b), 
then 


(1) = D,(x,a+6). 
(2) ¢9—c¢,D,(x, a) D,(x,a) = (—1)"D, (x 
To prove (6), we note that 
D,(z, 6) = F(z; 0; 6) = (5) F(x; t; 6). 
Hence from (1), n ; d\" 
> = (5) F(x; t; 6). 
i=0 dt} 
Using (8), we write (6) as 


\t=0 
| 


| 


t=6=0 


d d 
wee = 
F(0; 0; b) F(x; t; a) 


(") (53) F(0; 0; b) Fe: t: a) 
(5) F(0; t; a) 


= (5) Fo: t; b) F(x; t; 


t=0 


- (5) F(x; t; a+b)) 
= D,(x; a+b). 


=0 


| 

: 

— ] 
| 

a 

| 

e “F(0; 0; b) F(x; t;a | 

i 

| 
| 
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To prove (7), we have 


i= t=0 


ad 
= (55) € "at F(0; 0; b) F(a; t; a)| 


;t=6=0 


t=0 


e F(0; —0; b) F(a; t; a)| 
|t=0=0 


( 
+5) F(0; —0; b) F(x; t; a) 


t=6=0 


| t=6=0 


F(0; —6; 6) (5) Fe: t; a)| 


t; a; 


a, +(—1)*,). 


> 
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1. LyTRopUCTION 


WE consider a normal population which is specified by two sets of p and q variates 
respectively, whose variances and covariances are arranged in a matrix 


2 


which hereafter will be called the variance matrix of the p+q variates. To dis- 
tinguish the two sets of variates we have written the matrix R in a partitioned 
form; thus R,,, is the variance matrix of the first set and R,, that of the second 
set, while R,,, contains as elements the pq covariances between any variate of 
the first set and any variate of the second set. We have also put 


a convention to which we shall adhere in the case of all rectangular matrices 
whose orders are indicated by the suffixes p and q. 

Suppose now that a selection is carried out in the population in such a way 
that (i) all variates remain normally distributed, and (ii) that the variance 
matrix of the first set is changed from R,,,, to V,, which may be any preassigned 
matrix, provided it is symmetrical and positive definite. Owing to the statistical 
dependence between the p+q characters, the other variances and covariances 
will also be modified, and it is known that the variance matrix after selection 


is given by 
V, Va Rap Rap V, Raq — Bop — V, Rep) Rog 


ap id pp "pp Ppp Ppp 


This problem was first solved by K. Pearson (1903). The matrix form in which we 
have quoted his result is due to Aitken (1935, 1936). The above formula can be 


Ms 

| 
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obtained without any reference to the statistical method by which the change 
of the variances and covariances in the first set is effected. 

It is the object of this paper to show that selection can be regarded as the 
limiting case of a certain regression problem with respect to the population of 
variance matrices computed for all possible samples of individuals: suppose 
that the variance matrix for an arbitrary sample of n persons is 


The matrix Z will, of course, vary from one sample to another and will also 
depend on n, the number of persons in the sample; in other words we shall 
obtain a population of matrices Z which will possess a certain distribution law 
(see §5 below). Consider now the subpopulation or “‘array”’ of those matrices 
Z in which the first submatrix Z,,,, is equal to a given matrix V,,,. Our task will 
then be to find the mean value or “expected’”’ value V* of this array. Evidently 
V* will be a function of V,,,,. The chief result is that the mean V* of this sub- 
population of Z-matrices tends to the matrix V (equation (3)) as m tends to 
infinity. Thus selection in Pearson’s sense means finding the average value of the 
variance matrix with respect to the population of all possible infinite normal samples 
which are subject to the condition that the variance matrix of the first set of variates 
is equal to the preassigned matrix V,,,,. 

This idea was communicated to the present writer by Prof. Godfrey H. 
Thomson, who has discussed some of the consequences elsewhere (1939). In 
this paper} we propose to give an analytical proof of Prof. Thomson’s statement 
by deriving an explicit formula for the average of the variance matrix under the 
conditions referred to. 


2. MOMENT GENERATING FUNCTION FOR AN ARRAY 
Consider two sets of variates 
= ..., Lp} 
and Y = {Yr Yao 


which are envisaged as column vectors of orders p and q respectively, and 
suppose that their frequency differential is given by 


d(x, y) dady, 


where dx stands for dx,,dxq,...,dz,, and dy for dy,,dyg, ...,dy,. The moment 
generating function of x and y is then 


g(t,8) = Gla, yet dedy, (G) 


+ The author wishes to express his thanks to Prof. Godfrey H. Thomson for suggesting this 
problem to him. He is also indebted to a referee for making some valuable criticisms, especially 
in connexion with the subject of § 4 below. 


| 
= 
4 
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where the vectors ¢ and s are the moment carrying variables representing x and 


y respectively; the accent, as usual, denotes the transposition of a matrix, so 
that ¢’ is a row vector and p 
Ux = i> 
i=1 


q 
and similarly = 
i=1 
We now consider an x-array of the variates, i.e. we assign some constant values 
y to the variates y. The distribution function of the remaining variates x is then 


evidently given by 
$*(x) = = 9); 
max 


and the corresponding moment generating function becomes 


On the other hand, using the Fourier integral theorem on equation (G) 
we find reo 


whence, comparing the last equation with (G*), we obtain the result 


g*(t) = const.[ 


(4) 

When working with moment generating functions it should be borne in mind 
that the constant term, i.e. the term independent of the moment carrying 
symbols, is always equal to unity. Hence throughout the analysis we can neglect 
any non-zero multiplicative constant; and in the final result we can restore the 
correct constant by making the first term in the expansion of the moment 
generating function equal to unity. 


3. SOME LEMMAS ON MATRICES AND DETERMINANTS 


S, 
(i) Let s-| i "| 
Bey 


be any square matrix which is partitioned as shown, the suffixes indicating the 
number of rows and columns for each of the four submatrices, and suppose that 
| S,,|#0. It is then easy to verify the matrix identity 


iy | I Spa Sa Sop 
Se O S 


aq 
whence on taking determinants 


| S| = | < | Spp Sen 


a 
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(ii) If X =[a,,] be any matrix, we shall use the symbol O(z?) to denote any 
function (scalar or matrix function) of the x, which, when expanded as a power 
series, involves only terms which are at least of the second order in the 2;,. 
E.g. for an arbitrary square matrix X we have 


where trX = Vr 
denotes the “‘trace” of X. : 
(iii) We shall frequently use the relations 
tr {A B} = tr{ BA}, 
tr{A BO} = tr{ BCA} = 7) 


the general rule being that the trace of a product of matrices is unaltered when 
the factors are permuted in cyclical order. 


4. INGHAM’S INTEGRAL 


Let U=[u,,] and V =[v,,] be any given positive definite matrices of order p, 
and let 7'=[t,,] be a variable symmetrical matrix whose }p(p+1) distinct 
elements are regarded as independent variables. Then A. E. Ingham has proved 
(1933) that 


(;,] | |-*e dT = J (8) 
p-1 
where J(V,h) = (2 | V \r(»-§)| (9) 
\ = 


The integral (8) is an }p(p+ 1)-fold integral to be extended over the }p(p+ 1) 
distinct elements of the symmetrical matrix 7'; accordingly we have introduced 
the abbreviation dT = ty = TL 2 
a<p 

Ingham has shown that the integral converges absolutely when h> $p(p+1). 
This condition will in general be fulfilled in our problem, because h will be 
identified with }(n—1) where n is the number of persons in a sample, and p will 
be the number of directly selected tests. 

Further, we shall need for our purpose to extend the validity of (8) to the 
case where the matrix U has complex numbers as its elements, provided that 
the real parts of the elements form a positive definite matrix. This is easily done. 
Suppose that U =U, 


where U, and U, are real symmetric matrices and where U, is positive definite. 
We have then for the left-hand side of (8), the integral 


1 po 


4 
=f 
> 
| 
- 
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A change of origin W =T-O,, 


where W is the matrix of the new variables w,,, w49, ..., Wp», gives US 


2a 
In the last integral U, is real and positive definite, so accordiag to (8), we obtain 
the result J(V, h), 
i.e. JV, h), 


which is exactly the same as the right-hand side of (8). The desired extension 
is therefore achieved. 

For our purpose it is sufficient to know that the expression (9) is independent 
of U, and we shall write the result in the form 


| | U-iT dT = const. 


the elements of the matrix V being treated as constants. 


5. SAMPLING DISTRIBUTION OF VARIANCES AND COVARIANCES 


We consider all possible samples of x individuals drawn from a (p+ q)-variate 
normal population. Each sample will have its own variance matrix 


Z Z 


ap aq 
As we pass from one sample to another the matrices Z will form a population 
whose distribution function has been worked out (Wishart & Bartlett, 1933). 
The moment generating function of this distribution can be written in the 
form (loc. cit. p. 269) 
g(T) =| | RA-=T (11) 
n 
where R is the variance matrix of the original population. The symbols it, and 
2it,,(a<f) are the moment carrying variables for the variances z,, and the 
ap 
covariances Z,,(« < #) respectively. Thus, if the expansion of g(7’) as far as linear 
t sintb ‘ 
erms in be = + 2i + 
a cp 


it would follow that the mean value of z,, is w,,, and that the mean value of 
tap 18 W,,. The last equation can be more conveniently written in the form 


g(T) (12) 
Incidentally, with this notation it is quite easy to deduce the, well-known result 


n 


| 
if 
= 
| 
| 
| 
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For we can rewrite (11) thus 


| —Kn-1) 
n 


—i(n—1) 
(1-2 (rR) 
by (6), p. 298. Expanding the expression on the right-hand side we obtain 
n—1 
n 


Therefore, by comparing this with (12) 
n 


g(T) = 1+ 


itr(TR)+O(2). 


6. MOMENT GENERATING FUNCTION FOR AN ARRAY Lop = ae 


We now consider the array of the Z-distribution in which the variables 
Zp» have assigned values} Z,,,. According to the results of §2, the moment 
generating function g*(7’) for this subpopulation of Z-matrices is obtained from 
g(T') by applying a Fourier transformation with respect to those variables which 
are kept constant. Thus 


(13) 


The integration refers to the }p(p+1) elements of the symmetrical matrix 7,,,; 
we have suppressed the normalizing factor of the integral aud the constant 
factor | R|-*"-» of the function g(7'). In order to evaluate the integral we 
temporarily put 


@ 
R= | (14) 
Qap 
and 2% 2 
pp n n 


Qap n Qaa n 
Hence by (5), p. 297, 


| Ppp pa~aq ~4@ 


Substituting this in (13) and noting that S,, is independent of the variables of 
integration we find 


Now let the matrix i= be defined such that 


2 


(16) 


} The matrix V,, is, of course, symmetrical and positive definite. 


La 

| 

| 


WALTER LEDERMANN 301 


which is constant with respect to the integration. Then 


2 2\" 


and the moment generating function of the array becomes 


‘ap | = 


| | | | Upp | AT 


pp’ 


where numerical factors have been ignored. The integral on the right-hand side 


of the last equation is precisely of the type discussed by Ingham, whence by 
(10), p. 299, we can write 


On the other hand, the expansion of g*(7') must be of the form 
g*(T) = 1+ +t tr + Ole), (18) 


there being no term in 7,,,, since the corresponding variables are now fixed. Our 
object in the next section will be to expand (17) as far as linear terms in the tap: 
a comparison with (18) will then immediately yield the mean values of the 
variables Z,,, and Z,, in the array Z,,,, = Vp». 

In order to justify the application of the extended form of Ingham’s integral 
in our case we still have to show that the real part of the matrix U,,, defined in 
equation (16) is positive definite. Denoting this matrix by U®,, it is seen that 
the elements of U®), are real continuous functions of the elements of the matrices 
T,, and T,,. At the point 7, = 0 and 7), = 0, the matrix U%), is reduced to, 
say, where 


pp? 2 


in accordance with (15) and (16). It is sufficient for our purpose to show that 
U®) is positive definite. For then, by continuity, U®, will remain positive at 
least for a certain range of values 7,40 and 7, #0, and in the expansion (18) 
the independent variables may be restricted to as small a range as we please. 

In order to show the positive definiteness of U®,, we express the right-hand 
side of (19) in terms of the matrix R as follows: by (14) we have 


I= RQ, 


i.e. 


Ray ea Rey Raq Rap Rea Qaa 
Hence O = Ryy Qnqt Rng 
or ROR, = (20) 
and T= Ryp Qpp + Rpg Van: 
= Qop + Rap oq 
whence by (20) (21) 


Biometrika xxx 20 


| 

| 
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Similarly, we can deduce the identity 


On substituting (21) in (19) we see that 
Ue), = } ly 


But since # is a positive definite matrix, so is i. and R,}, and consequently 
also 


7. THE LINEAR TERMS OF THE MOMENT GENERATING FUNOTION 


In order to find the linear terms of the function g*(7') (equation (17), p. 301) 


we write g*(T) x -fo, 
where fi = | 
and fo = pp nr, 


and expand each factor separately. First we have, by (15), 


|—3(n—1) —Kn—1) 
| S, | n — Tq 


> 


—i(n—1) 


| Soa | Qua {2 tr Tq) 


| Soq = | ite (Qa Tya) + or)| 
by an argument similar to that used at the end of § 5, thus 


fix 


Next, consider = 
where, by (15) and (16) 

2 2% 2% 2% 


n 


since for matrices with sufficiently small elements 
Hence (Cu = qq + O(#), 


and the expression for U,,,, becomes 


2 2i 2i 


| 
| 
a 
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After expanding and collecting terms which are linear in the t,, we obtain 


n op = C+ n Qa Qa Qap + Qa Lav} Vay O(?), 


where C is a certain constant matrix whose trace we shall denote by c. When 
taking the trace of each side in the last equation we shall rearrange the factors 
in every term in such a way that 77, or T/,, occupies the first place and T,,, occupies 


the last place. This can always be done by a cyclical permutation of the factors. 
Thus 


2 2% 


Now the third term in the square bracket is equal to the first term, since the 
trace of a matrix is equal to that of its transpose. Hence 

tr (U, = + 20 tr Yop) — (Tq Pan.) + 
and consequently 


= oc 1 — Qi tr 
+6 tr Gor) + OF). (24) 
For the term }nc merely contributes a numerical factor, and generally we have 
etX+0@ — 14 tr X + 
On combining the results (23) and (24) we find 


g*(T) = tr (T,,, Qa 
+itr BC Opa + ar) | (25) 


The two members of (25) are exactly equal, and not only proportional, 
since the first term on the right-hand side is equal to unity (§ 2). By comparing 
(25) with (18), p. 301, we can now read off the required mean values of the arrays 
in the Z-distribution, namely, 


Vow = — 

The first of these relations becomes after transposition 


It only remains to express the result in terms of R instead of Q. This can be 


done with the aid of (20) and (22), pp. 301, 302. Thus we finally get 
y n — 


| 
| 
20-2 
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It is remarkable that the value of V}, should be independent of 1, i.e. in 
the array Z,,, = V,, of the Z-population, the mean value of Z,,, is independent 
of the size of the sample (provided, however, that the inequality referred to on 
p. 298 is satisfied). 

When n->©o, the matrices VF, and V%, become identical respectively with 
V,, and V,, which occur in the solution to Pearson’s problem of selection (see 
equation (3), p. 295). In fact, we have even for finite n 


Via Vow 
and bev Vee Rap Voy Rop Rog + Rog Ruy Raph 


This proves Prof. Godfrey Thomson’s conjecture regarding the connexion 
between statistical selection and arrays of samples. 
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THE CRANIAL AND OTHER SKELETAL REMAINS OF 
TASMANIANS IN COLLECTIONS IN THE 
COMMONWEALTH OF AUSTRALIA 


By J. WOUNDERLY, D.D.Sc. 


1. IyrrRopucTION 


A RESEARCH grant from the University of Melbourne enabled me to commence, 
in 1930, an enquiry into the physical anthropology of the extinct Tasmanians. 
The physical remains of these people available for examination consist chiefly 
of crania. In the last twenty-five years, a fairly large number of skulls claimed 
to be of Tasmanian origin has been added to public and private museum col- 
lections. The total number of crania contained in collections in the Common- 
wealth of Australia (including Tasmania) to which my data refer is 114. As 135 
years have passed since the beginning of European settlement in Tasmania, it 
does not appear likely that many more crania will be unearthed in the island, 
unless a special search is made for them. The data from the material at hand 
should, therefore, be recorded for future reference, in case the specimens them- 
selves be lost or damaged. 

The aim of my investigation was to examine the conflicting records, opinions 
and methods of various authors, and to record original observations, with the 
object of presenting a true picture of the craniology of the Tasmanian aborigines. 
The basis of the enquiry, the methods employed, and the instruments used are 
referred to in appropriate sections of this paper. 

The work associated with the enquiry was done in the Anatomy School of 
the University of Melbourne, in the public museums of Melbourne, Adelaide, 
Hobart, and Launceston, in the Institute of Anatomy at Canberra, and in the 
residences which contain the privately owned collections. 

The data in this article have been carefully gathered in the hope that they 
will be of use to those who are competent to deduce from them information 
of value to all who are scientifically interested in the extinct Tasmanian race. 
I leave the more elaborate forms of biometric analysis to others, as my special 
knowledge is in anatomy, and not in statistics. 

Our knowledge of the origin and migration of the extinct Tasmanian abori- 
gines seems to be no further advanced to-day than it was in 1914, when Sir William 
Turner completed his classical enquiry into their physical characteristics. This 
fact is the more remarkable because many related enquiries have been made in 
the meantime. Turner not only made a sound, systematic and practical in- 
vestigation of the physical characteristics of these people, but he also examined 
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and reviewed impartially almost all the writings on the subject published in 
English, French, or German up to his time. 

The total physical remains of the extinct Tasmanian race are very small in 
quantity. They consist chiefly of about two hundred crania, many being in a 
bad state of preservation, and of not more than a dozen skeletons, a few dozen 
odd limb bones, and some odd specimens of hair and dried hands. Efforts to 
increase our knowledge of the physical anthropology of these people, therefore, 
depend mainly on investigations of their osteo!ugical remains. 

Since the publication of ““The Non-metrical Morphological Characters of the 
Tasmanian Skull” (Wunderly & Wood Jones, 1933) fourteen crania claimed to be 
Tasmanian have been added to collections in the Commonwealth of Australia. 
The number of crania included in my “‘Tasman” series is now 114, all of which 
have been systematically examined by me at least twice. A close study has been 
made of all the specimens in this series, in order to obtain a concise record of 
their morphological and anatomical characteristics. Particular attention has 
been given to the question of whether all the crania are authentic remains of 
Tasmanian full-blood aborigines or not, and an attempt has been made to 
classify the specimens correctly according to racial origin and sex. Special atten- 
tion has been devoted to an important discovery of crania and limb bones at 
Eaglehawk Neck in Tasmania. 

The metrical and anatomical data have been compared with those recorded 
by several investigators who have worked in Australia since 1897. Cranial 
anatomical characteristics observed during the present enquiry are listed and 
correlated with those recorded by Turner (1884 and 1908) with a view to 
building up a definitive basis for the racial diagnosis of skulls of Tasmanian 
full-blood aborigines. 


2. AN EXAMINATION OF SOME ARTICLES WRITTEN SINCE 1897 ON THE 
PHYSICAL ANTHROPOLOGY OF THE TASMANIANS 


Such widely diverse opinions are expressed in these articles that it was 
found necessary to enquire into the basis of each. Many of the articles are well- 
known and their merits are so obvious, and have already received such favour- 
able comment, that reference is confined chiefly to the defects, if any, that have 
been found in them, or in the work on which some were based. Some of these 
defects could not have been discovered except through an examination of the 
specimens themselves. Correlation of all that has been found for, and againsi, 
the various articles enables one to assess the degree of reliability of the results 
and the reasonableness of the opinions expressed in them. 

The articles examined, given in the appended list of references, deal with 
crania claimed to be Tasmanian, and contained in collections in the Common- 


wealth of Australia. The numbers of specimens treated are given in the following 
table. 
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Year of No. of skulls 5: 

| publication examined Investigators 
1898 18 Harper & Clarke 
| 1909-14 52 Berry & Robertson 
| 1912-14 — Biichner 
1910 — Cross 
| 1916 3 Ramsay Smith 
| 1924 6 Wood Jones & Campbell | 

1928 6 Hrdlitka 
1929 — Wood Jones 
1933 100 Wunderly & Wood Jones. 91 of | 
the skulls were examined by | 
the former and 6 by the latter | 
1935 — Wunderly 
1938 114 Wunderly (present article) 
t 


It was observed that although many of the writers refer to the work of 
Huxley, Turner, Duckworth, Broca, Topinard and others, yet only a few show 
evidence in their methods, or their writing, that they understood and could 
apply the teaching of these physical anthropologists. One way in which all but 
Harper & Clarke (1898), Hrdlitka (1928) and Wunderly (1935) have failed to do 
so is in not adopting a critical attitude towards the material to be examined, 
in order to distinguish authentic from unauthentic specimens. 

The investigations on which tue articles have been based will be referred to 
separately. 

(a) Harper & Clarke (1898). These anthropologists examined eighteen crania 
labelled ‘‘Tasmanian”, which were contained in the Tasmanian Museum at 
Hobart, this being the first systematic investigation of its kind made in the 
Commonwealth. Their article contains abundant proof that they understood 
the work, and applied the teaching of Turner and others. Great praise is due to 
them for having made a preliminary critical survey of the material available to 
them, which resulted in three skulls being rejected as “‘improperly classed”’, 
and three others being classified as the remains of half-castes. The craniometrical 
measurements recorded by Harper & Clarke were made directly on the skulls 
themselves, and they defined clearly the anatomical points between which the 
measurements were made. During the present enquiry their measurements were 
checked on the crania on two separate occasions, and neither a fault in their 
methods nor an inaccuracy in their records has been found. Their report is 
considered to be worthy of premier position among the earlier articles on the 
Tasmanians written by investigators in Australia. In the present enquiry the 
classification of the specimens examined by Harper & Clarke is consistent with 
theirs in so far as it draws a line between those which are, and the others which 
are not, the remains of full-blood Tasmanian aborigines. 

(b) Berry & Robertson (1909 a, b, c); Berry et al. (1910, 1914). The names of 
these investigators are associated with those of a team of workers which included 
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a professor of anatomy, a medical graduate, two research mathematicians, and 
two medical students. Their reports have received such wide publicity that 
reference will be made to some aspects of their work which are not generally 
known. 

Morant (1927) has already referred to defects which he found in Berry & 
Robertson’s work. In marked contrast to Harper & Clarke, Berry & Robertson 
did not adopt a critical attitude towards the authenticity of the material 
available for examination. Among the fifty-two crania which they examined 
and accepted as authentic were all that were rejected by Harper & Clarke, or 
classified by them as the remains of half-castes. Other specimens included in 
the fifty-two have also been classified as unauthentic in the present enquiry, 
or as unsuitable as sources from which to obtain reliable data. Some of the 
latter kind have already been referred to and illustrated by Wunderly (1935). 
They took no account of the diagnostic anatomical characteristics of the 
Tasmanian skull as outlined by Turner (1884, 1908). Had they been familiar 
with Turner’s teaching, they could hardly have failed to recognize the un- 
authentic specimens. They did not give an explicit account of the basis on which 
they judged all the crania to be authentically Tasmanian, beyond expressing 
the opinion that “‘every one presents over 90°% of the features so character- 
istically found in the skull of the Tasmanian aboriginal”. 

Berry & Robertson’s descriptions of their methods contain many incon- 
sistencies. For example, they state in some places that their measurements 
were made on the skulls, but in others they mention that many were made on 
dioptographic drawings, which were regarded by them as satisfactory sources 
from which to obtain accurate measurements. They placed such a high value 
on dioptographic drawings that special reference to them seems to be 
justified. 

The Berry & Robertson team made 211 (1909c), and the writer has made 
245, dioptographic drawings of Tasmanian crania. The same individual Martin’s 
dioptographic apparatus was used in both cases. The instrument was tested 
by me for accuracy, and it was found that, after the most careful adjustment, 
the error in the drawing could not be reduced below 2°%, and it was frequently 
as high as 4%, of the direct measurement. In many of their drawings there is no 
indication that certain parts had been lost through damage. In such instances 
many writers have assumed that Berry & Robertson’s drawings represent the 
intact skull. It is unreasonable to expect fine accuracy in dioptographic 
drawings because, in addition to the errors introduced by mechanical defects 
in the apparatus, there are the added sources resulting from, the inking by hand 
over a pencilled line, and the uncertainty due to the width of the line of the 
drawing. My conclusion is that dioptographic drawings may be regarded as 
useful general representations of the various normae of a skull, but that they 
are not reliable as sources of measurements. 
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The measurements recorded by Berry & Robertson were checked on the 
specimens by me on at least two separate occasions. While the majority were 
found to be correct, many errors were discovered, some as great as 10%. 
Although Berry & Robertson could have measured the cranial capacity in 
twenty-four of the fifty-two crania referred to by them, it has been observed 
that they have only recorded it for the same specimens as those for which 
Harper & Clarke had already recorded it, with the exception of one which had 
been lost in the time between the two enquiries. Furthermore, Berry & Robert- 
son’s figures are the same as those of Harper & Clarke. These facts give one 
the impression that perhaps Berry & Robertson overlooked an obligation to 
acknowledge the use of Harper & Clarke’s figures. 

(c) Biichner (1912 a, 6). Biichner was one of the mathematicians associated 
with Berry & Robertson. He depended on the dioptographic drawings for his 
metrical data, and disclosed his opinion of the drawings when he stated that the 
‘diagrams are therefore strictly accurate and correlative”. In a special enquiry 
made by him into the degree of prognathism of the Tasmanian skull, he again 
depended on measurements made on the drawings. In many of the skulls the 
bone in the region of the prosthion had been lost through damage before Berry 
& Robertson made the drawings, but this loss is not indicated by them. Con- 
sequently, many of Biichner’s basio-alveolar measurements are not reliable, a 
defect for which the anatomist rather than the mathematician should be held 
responsible. 

It should be noted that Biichner measured the nasio-alveolar, and the basio- 
alveolar diameters from a common point—the alveolar point—whereas separate 
points are used in the specifications of the International Agreement. The 
distinction between the alveolar point and the prosthion has been overlooked 
by a majority of those who have gathered craniometric data in Australia. 

(d) W. Ramsay Smith (1916). There is little to add to the comments on the 
work and writing of W. Ramsay Smith which have already been made by me 
(1935). He examined two skulls and a fragment contained in the collection of 
the Australian Museum, Sydney, and one which was at that time in a private 
collection, but has since been lost. His article does not contain any refereice to 
the work of Turner, or any evidence of knowledge of Turner’s work and writing. 
Without questioning the authenticity of the material submitted to him, he 
accepted it as authentic, although the classification of one skull was based on no 
better evidence than a label which had been attached to it in the Sydney Museum, 
and which was inscribed “‘Tasmanian from Hobart”. Turner emphasized the 
folly of relying on such labels. Smith accepted this skull as that of a male 
Tasmanian aboriginal. It does not possess any of the characteristics defined by 
Turner as indicative of masculinity, and the facial part exhibits clearly marked 
European characters. In the present enquiry it has been classified as the remairs 
of a female mixed-blood (Australian-European). 
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Ramsay Smith did not specify the methods he used in measuring the crania, 
or compare his results with those obtained by other investigators. 

(e) Wood Jones & Campbell (1924). These enquirers examined and described 
six skulls contained in the collection of the South Australian Museum in 
Adelaide. They took measurements on the skulls in accordance with the speci- 
fications of the International Agreement, except that they took both the nasio- 
alveolar and basio-alveolar diameters tc the alveolar point. Their introductory 
remarks indicate that they seem to have over-estimated the value and accuracy 
of dioptographic drawings. They accepted, without question, all specimens as 
being authentically Tasmanian. This fact is of particular interest in the light of 
Hrdlitka’s later description of some as of “ Australian type”, and others as of 
“type quite Australian”. In the present enquiry the specimens referred to by 
Hrdlitka as resembling the Australian skull have been classified as not the 
remains of full-blood Tasmanian aborigines. 

Wood Jones & Campbell’s article does not refer to the work and writing of 
Turner or of any other physical anthropologist of note who had made a special 
study of Tasmanian craniology. The value of their craniometrical data is 
depreciated by the inclusion of measurements which are merely visual estimates 
of the correct dimensions. Because errors were found in some of the measure- 
ments recorded by them, check measurements were made on two separate 
occasions during the present enquiry. 

(f) Hrdliéka (1928). Hrdlitka examined six crania claimed to be Tasmanian 
when he was in Australia in 1925. These specimens were contained in the Adelaide 
and the Melbourne public museums. He also examined thirty-one Tasmanian 
crania contained in the museum of the Royal College of Surgeons, London. 
While in Australia he measured and described a large number of Australian 
crania. His reports show that his knowledge of the craniology of the Tasmanians 
and the Australians was sufficiently sound to enable him to distinguish the 
differences—in some instances very small—between the crania of these 
races. 

It is greatly to his credit that, in the short time available to him, he recog- 
nized that four of the six specimens referred to were not Tasmanian in type, 
but were of Australian type. These four crania had been previously accepted 
uncritically by Wood Jones & Campbell as the authentic remains of Tasmanian 
full-blood aborigines. In the present work the four specimens have been classed 
as the remains of Tasmanian-Australian mixed-bloods. This classification was 
based on the Tasmanian and the Australian diagnostic anatomical character- 
istics, and it is confirmed by the reliable evidence of ethnological remains and 
historical and geographical records. 

(g) Wood Jones (1929b). An excellent suggestion was made by Wood Jones 
that composite drawings of the hypothetical skull, occupying the position of 
mean in a series of crania, would be useful in conveying an idea of the general 
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form of each racial type of skull.* If separate composite drawings had been 
prepared for each sex, their value would have been more than twice as great as 
that of a single drawing. It is unfortunate that the composite drawings prepared 
by Wood Jones were based on data obtained from the work of Berry & Robert- 
son, and not from new data. 

Summarizing the examination of the articles referred to in the light of the 
results of the present enquiry, it has been found that: 

(i) the critical attitude adopted by Harper & Clarke and by Hrdlitka 
towards the material which they examined was an essential preliminary step 
towards the ultimate correct classification of the crania in the Tasman series, and, 


(ii) the metrical data provided in their articles are more reliable than those 
contained in the others. 


3. THE AUTHENTICITY OF THE SKULLS 


When the present enquiry began it was found that a relatively large number 
of skulls regarded as Tasmanian in origin had been added to public and private 
collections in the Commonwealth of Australia since 1909, when Berry & Robert- 
son announced that the number was then fifty-three. A preliminary survey of 
material available for examination revealed that some specimens, although 
labelled ““Tasmanian”’, are remains of Europeans. These skulls were doubtless 
unearthed in Tasmania, but they do not exhibit any evidence of aboriginal 
origin. For this reason they have not been included in the Tasman series of 
crania. 

Still other specimens do not exhibit even slight resemblance to the Tas- 
manian, or any Negroid, or Negrito type of skull, the difference in a few cases 
being conspicuous to a gross extent. Occupying intermediate positions between 
these specimens and the skulls of general Tasmanian type are some crania 
which possess characteristics suggesting origin from either (a) two aboriginal 
races, or (b) a European and an aboriginal race, or (c) a Mongolian (Chinese) 
and an aboriginal race. 

Turner concluded that some skulls examined by him had been regarded as 
authentically Tasmanian merely because they were labelled ““Tasmanian” by 
collectors or museum officials with no special knowledge of craniology. He con- 
sidered that some such specimens were remains of half-castes, with Polynesian 
or European admixture. 

In the present enquiry all crania claimed to be Tasmanian have been 
included in what is called the ““Tasman” series, providing they exhibit any 
evidence, however small, of origin from an aboriginal race; but the Tasman 
series has been divided into several sections, which enable the specimens of the 


(* The numerous type contours for racial series of crania which have been given in papers n 
Biometrika since 1911 are composite drawings of the ““mean type”.—Ed.] 
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same racial origin, whether full-blood or mixed-blood, to be grouped together. 
Particulars of the individual specimens are given in Appendix I. 

Reliable historical records, narratives, and official documents contain 
abundant reference to mating between Tasmanian aborigines and either Aus- 
tralian aborigines or Europeans. According to West (1852) and others, some 
Australian aborigines were sent to Tasmania in 1820 and 1828, owing to an 
official plan for pacifying the Tasmanians; inter-marriage between Tasmanians 
and these Australians was common. One Australian native was responsible for 
many murders, a victim being a Polynesian. Many of the Australians are known 
to have died in ‘Tasmania. There is also reliable proof that “‘blackbirders”, 
whalers and sealers were responsible for transporting natives, particularly 
women, to Tasmania from other localities. Many of these women gave birth to 
half-caste children while in Tasmania. It is also possible that some racial 
admixture occurred in the eighteenth century, or earlier, as the result of visits 
of adventurous explorers who left no records of their voyages owing to illiteracy 
or shipwreck. 

Numerous photographs of groups of Tasmanian natives are available in 
which it is easy to distinguish the facial differences between full-bloods and 
mixed-bloods, particularly those of Tasmanian-European origin. Official records 
show that the mixed-blood inhabitants of Tasmania resulted from mating 
between two or more of the following races: Tasmanian, Australian, European, 
Chinese, Indian, Japanese, Maori, Negro, Polynesian, Syrian, and others. 

The individual histories of a number of the crania emphasize the need for a 
critical attitude towards their authenticity. Since 1897 several specimens have 
disappeared from collections, some in suspicious circumstances. It is known 
that “trafficking” in Tasmanian crania has occurred on a few occasions, and 
that an unauthentic was substituted for an authentic specimen at least once. 
Prolonged enquiry has revealed that some supposed Tasmanian specimens were 
unearthed on the Australian mainland, while others were gathered from still 
more distant localities. Some of the latter belonged to private collectors 
residing in Tasmania, after whose death the collections were presented or sold 
to museums, unaccompanied by written records, or were divided among sur- 
viving relatives. The supposition that some such specimens are of Tasmanian 
origin has therefore no sound foundation. Ethnological specimens, geographic- 
ally associated with the unearthing of some crania in Tasmania, are racially 
and culturally referable to races other than the Tasmanian. A description of 
them will be contained in an article on the origin of the 'Tasmanian race, now 
in course of preparation. 

The crania comprising the Tasman series have been gathered principally 
through organized search or casual finding. Some, however, have been acquired 
as gifts or as the result of purchase. In some instances the persons from whom 
they were obtained recorded where they had been found, while in others this 
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information was not given. Not a few of the crania had been in several collec- 
tions before reaching their present locations, and in some cases the names of the 
owners of those former collections cannot be traced. Persistent enquiry has 
elicited evidence which shows that some specimens had not been found in 
Tasmania. 

The Tasmanian diagnostic anatomical characteristics, which were very 
clearly defined by Turner, were used as a basis for the essential classifications. 
Whenever collateral evidence, in the form of reliable ethnological, historical or 
official data, was available, it was consulted. It is not claimed that the classi- 
fication has been made without error, but much time and effort has been ex- 
pended in an attempt to classify the specimens accurately on the chosen basis. 


4. THE ANATOMICAL DIAGNOSIS OF TASMANIAN CRANIA 


From among many publications which had, at first, been regarded as suit- 
able, one book by Duckworth (1904) and four papers by Turner (1884, 1908, 
1910, 1914) were finally selected as a basis for such criteria. Duckworth was 
relied on as a guide in principles. The many anatomical characteristics exhibited 
specifically in crania, authentically the remains of Tasmanian full-blood abori- 
gines, were so minutely observed by Turner that his comprehensive description 
of them is considered the best ever published. 

Based on the thirty-six listed characteristics defined by Turner (1908), a 
primary classification was made to separate the skulls of Tasmanian full-bloods 
from others. Twenty-three additional characteristics have been gathered during 
the present enquiry, and have been used to supplement those of Turner; these 
have proved helpful in diagnosing remains of mixed-bloods. Some of the 
characteristics refer only to male, and others only to female, skulls. Any skull 
exhibiting over 75°, of such characteristics is classed as the remains of a 
Tasmanian full-blood aborigine. 

It is interesting to note that in all skulls classed as remains of Tasmanian 
mixed-bioods, the cranial part is Tasmanoi' in general form, while the facial 
part shows the foreign characteristics; furthermore, it is in these foreign facial 
features of the skull that one sees the clue to the identity of the admixing race, 
whether European or otherwise. 

The differences in certain anatomical features found in skulls of Tasmanian 
full-bloods, Tasmanian-European mixed-bloods, and Australian full-bloods are 
listed below. The data for Tasmanian full-bloods are taken from Turner’s papers, 
for the mixed-bloods <:om my own observations, and for Australian fcll-bloods 
from Turner and others. Photographs of typical full-blood Tasmanian skulls 
are reproduced in Plates I-V and of skulls presumed to be those of Tasmanian- 
Australian half-bloods in Plate VI. 
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List of diagnostic Tasmanian and related cranial characteristics 


No. of 


characteristic 


Tasmanian full-blood 
(taken from Turner) 


Tasmanian-European 
mixed-blood (Wunderly) 


Australian full-blood 
(from Turner and others) 


Norma 
verticalis 


1 


10 


Norma 
lateralis 
1 


2 
3 
4 
5 
6 


Norma 
Facialis 


Elongated and dolichocephalic; some 
ovoid or pentagonal 


Parietal eminences prominent 


Behind eminences width rapidly de- 
creases to occiput 
Frontal eminences distinct 


Male skulls show triangular area an- 
terior to bregma 

Male skulls have shallow depression 
lateral to this triangle 


maximum cranial 


Skulls keeled along sagittal suture: 
keel usually limited to anterior one- 
third of suture 


Middle or posterior one-third of sagit- 
tal suture usually depressed 

Parietal foramina small or obliterated 

Supra-inial region large and rounded 
in female skulls, small in males 


Inion not large 


Forehead recedes in males and more 
nearly vertical in females 

Glabella and supraciliary ridges pro- 
minent in males 

Nasion deeply depressed 


Between obelion and lambda 
vault slopes gradually downwards 

Supraciliary ridge and upper border of 
orbit project in front of lower border 

Outer border of orbit far behind inner 
border 


the 


Vault roof-shaped 


Absence of grooves 


above supra- 
orbital foramina 
Maxillo-nas«l spine diminutive 
Breadth of anterior nares usually 


greater than half height 


| 
| 
| 
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Frontal breadth small compared with | 


Ovoid, or anterior ovoid 
and posterior some- 
what pentagonal 

Some show little pro- 
minence 

Less rapid in some 


More fullness in frontal 
area 


Frontal breadth greater 
than usual 


Generally less keeling 


Depression well-marked 
in some 

Small 

Female fairly large and 
rounded, male not 
large 

Sometimes large 


Forehead generally ful- 
ler 
Less prominent 


Some have little or no 
depression 
Slopes less gradually 


Usually level or upper 
behind lower 
Level in some 


Roof-shaped or more 
rounded 
Grooves in some 


Spine 
some 
Breadth less, aperture 
almost parallel-sided in 
some 


prominent in 


Elongated and ovoid 


Practically no promi- 
nence 

Decrease in width is 
much more gradual 
Forehead recedes more 
abruptly 

(Not recorded for a 
secies) 

(Ditto) 


Less difference between 
frontal and maximum 
breadths 

Generally keel more 
prominent, and ex- 
tending along whole 
length of suture 

Depression unusual 


Small 
Not so noticeable in 
male or female 


Sometimes large 


Forehead more receding 
More prominent 
Depression usually deep 
Slopes more gradually 
Level or upper behind 


lower 
More nearly level 


Roof-shape more acute 

Grooves common 

More prominent 

Breadth compared with 
height less than in 


Tasmanian, lower bor- 
der ‘guttered’ 
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List of diagnostic Tasmanian and related cranial characteristics (cont.) 


| 


| 


| 


No. of Tasmanian full-blood | Tasmanian-European | Australian full-blood 
characteristic (taken from Turner) | mixed-blood (Wunderly) (from Turner and others) | 
| Norma 
facialis (cont.) | 
5 Nasal margins rounded Margins very sharp | Less rounded 
6 Canine fossae distinct and in some | Usually shallow | Deeper than in Tas- 
very deep (deeper in female than | manian 
male skulls, J. W.) 

7 Orbits low but wide Orbits high, more nearly | Orbits approach square 
circular and borders | in many, more varied 
sharp in shape than in Tas- 

manian 

8 Infra-orbital suture usually obliter- | (Not recorded) | (Not recorded) 

ated 

9 Malar bones small Malar bones large in | Malar bones large 

some 
Norma 
basalis 

1 Palate wide, and shallow to moderate | High and narrow in | Wider and larger than 

height; none high | many in Tasmanian, often 
| very high 

2 Some exhibit fourth molar teeth Not seen in any Seen less often than in | 

; | Tasmanian 
3 No instance of artificial extraction of | Not seen Common 
incisor tooth 
4 No malocclusion of teeth (in present | Many show malocclu- | Impaction of mandi- 

work a few instances of impaction of | sion of teeth bular third molars in 
mandibular third molar teeth seen, some 
J. W.) | 

Norma 

occipitalis 

1 Many have wormian bones in lamb- | Less common | Not as usual as in Tas- 

doid suture | manian 

2 Tnion not large | Large in some | Large in greater num- 

ber than in Tasmanian 

3 Superior curved occipital line pro- | — — 

minent in some and divided into | 
upper and lower lines in others 

1 Third occipital condyle was not seen | Third condyle not seen | (Not recorded) 

in any specimen 

5 Two skulls have external pterygoid | — — 

plate fused with spine of sphenoid 
and pierced with two pterygo-spinous | 
foramina | | 
Supplementary diagnostic characteristics (Wunderly) 
General | 

1 Surface of bone very smooth Some rough | Rougher than Tasman- 

ian 

2 Areas of attachment of muscles only | More uneven | All more uneven 

slightly uneven 

3 All borders and margins rounded Borders of facial part | Not so rounded as in | 
sharp Tasmanian 

4 General characteristics exhibited most | — — 

clearly in distinctly masculine skulls | 

5 Closer resemblance between juvenile | — | — 

skulls of either sex and adult female | 
than between latter and adult male 
skulls 
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List of diagnostic Tasmanian and related cranial characteristics (cont.) 


No. of 
characteristic 


Norma 
verticalis 
1 


Norma 
lateralis 
1 


Norma 
facialis 
1 


| Norma 


basalis 
1 


2 


3 


Tasmanian full-blood 


Tasmanian-European 
mixed-blood 


Australian full-blood 


Superior temporal lines do not ap- 
proach as close to sagittal suture as 
in the Australian 

Depression in sagittal suture (Turner) 
diamond-shaped, longer diameter co- 
inciding with suture 


Middle one-third of each side of coronal 
suture slightly complicated in some 
skulls 


| In some specimens nasal bones weakly 


aquiline and markedly convex medio- 
laterally, very narrow at constriction 


Height of mandible behind second 
molar usually much less than sym- 
physial height 


Squamous temporal flat antero-pos- 
teriorly and from above below, whole 
temporal fossa flat 


| Very slight inclination, if any, be- 


tween upper and lower borders of 
orbits 

Fronto-nasal and fronto-maxillary su- 
tures usually almost straight 


Maxillary palatal torus common 
Zygomatic arches thin medio-laterally 


Teeth not very large 


Remarkable approach to uniformity of 
form and size in corresponding teeth 

Morphological elements more dis- 
tinctly outlined than in teeth of 
Australians, and still more than in 
those of Europeans 

Majority of teeth show greater number 
of these elements than are seen in 
Australians or Europeans 

Closer resemblance between 'Tasman- 
ians’ permanent and deciduous teeth 
(judged by the usually accepted de- 


scriptions of the latter) than seen in | 


Australians or Europeans 
Form of upper dental arch U-shaped 


Teeth occupy regular positions in each 
arch 

Dental caries not found in any skull of 
aboriginal who lived in natural state, 
seen in many skulls of those who lived 
in contact with civilization 


Not so close as in Tas- 
manian 


Seen in some 


More aquiline or flatter, 
not narrowly constrict- 
ed, some parallel-sided 


Heights 


more nearly 
equal 


Slightly convex in some 
to full in others 


More inclination 


Fronto-nasal elevated in 
many 


Not so common 
Usually thicker 


Teeth smaller and de- 
generate, very little 
wear 

Wide differences 


Elements indistinct 


Fewer elements 


Less resemblance 


Variable in form and 
often irregular in shape 


Irregular to very ir- 
regular in position 

Extensive caries in large 
majority 


Approach very close in 
many, especially males 


Seen in some 


Not complicated so 
often as in Tasmanian 


Not so narrowly con- 
stricted as Tasmanian, 
generally wider, sides 
of some nearly parallel 

Height behind second 
molar larger com- 
pared with symphysial, 
both larger than in 
Tasmanian 

Flat or slightly convex 


Some show great in- 
clination 
Straight or elevated 
above nasion 


Fairly common 
Much thicker and 


rougher 
Teeth larger, greater 
wear than in Tas- 
manian 


Not so uniform as in 
Tasmanian 
Not so distinct 


Not so many as in Tas- 
manian 


Not so much alike as in 
Tasmanian 


Usually parabolic or 
similar to elongated 
horse-shoe 

Regular 


Rare in natural state, 
common in_ contact 
with civilization 


| 

| | 
| 
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5. SKELETAL REMAINS FOUND AT EAGLEHAWK NECK 


A description of the finding of aboriginal adult and juvenile skeletal remains 
at Eaglehawk Neck, on the east coast of Tasmania, in 1919, was published by 
Lord (1919), from whose paper the following extract is taken: 


Upon arrival at Eaglehawk Neck, in company with Mr Brister and Mr W. H. Clemes, 
I found that a slight sandslip had occurred on the south-eastern face of one of the large 
sand dunes forming Eaglehawk Neck. A number of small bones appeared on the surface, 
and after collecting these a start was made to examine below the surface. Upon excavation 
a number of larger bones and several skulls were revealed. Owing to the fact that the dune 
in question was covered with Boobialla (Myoporum insulare), and the roots in many cases 
completely filled the cavities of the bones, the task of exhuming these relics of a bygone 
race was one of considerable difficulty. 


The bone in all the specimens is extraordinarily clean, a condition no doubt 
due to their burial in sand. Unfortunately, a large majority of the bones were 
broken during the difficult exhumation, and, although the cranial part of some 
of the skulls is intact, the fragments of the facial parts cannot be ide”*ified as 
belonging to any particular cranium. Limb bones were also found, but many 
are broken. All these specimens are deposited in the Tasmanian Museum, 
Hobart; a list of 330 of them was published by Lord & Crowther (1920). Since 
there is no evidence of ante-mortem injury to indicate death by fighting, it is 
probable that a tribal group perished from some natural cause. 

Five of the crania are sufficiently well preserved to enable reliable racial 


diagnoses to be made, and also to provide anatomical and metrical data of 
value. They are numbered as follows: 


| 
| Male | Female | Juvenile 


Tasman series Nos. 79, 80, 81 | 78 | 82 


In addition to the specimens included in the Tasman series, all fragments of 
facial and cranial parts were examined. The Tasmanian anatomical character- 
istics described by Turner are clearly exhibited in the Eaglehawk Neck remains, 
a few being more marked than in any other specimens. Not a single character- 
istic was found, whether facial or cranial, that would suggest either admixture, 
or racial origin other than Tasmanian. Turner noted that, while all skulls of 
Tasmanian full-bloods examined by him bear a general resemblance to one 
another, yet minor differences occur in individual specimens; for instance, the 
cranium viewed in norma verticalis may be elongated and dolichocephalic, ovoid 
or pentagonal. Similar differences seen in the Tasman series of skulls have been 
noted during the present enquiry in the case of cranial form, form of the orbit, 


the region of the forehead and nasion, and several other characteristics. 


Biometrika xxx 21 
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Because the cranial remains found at Eaglehawk Neck were found simul- 
taneously and in the one locality, it was decided to compare them with the 
other crania classified as those of Tasmanian full-bloods in the Tasman series. 
When these crania are roughly divided into two groups, the one containing skulls 
of aborigines known to have died since European settlement began, and the 
other containing those unearthed in the earlier days of settlement, it is found 
that the Eaglehawk Neck specimens resemble the latter more closely than the 
former group. In these two groups (“old group” and “‘recent group” in the 
following table), the better-known Tasmanian characteristics differ to some 
extent, as shown below: 


Characteristic | Recent group 


Cranial size 
Cranial form 


Smaller 
Angularly pentagonal 


Larger 
Curvilinearly pentagonal 


Somewhat rectangular Markedly rectangular 
Nasion Depressed Deeply depressed 
Parietal eminences Prominent, rounded Prominent, angular 


Orbit 
| 


The general difference between the ‘‘old” and the “‘recent”” group of skulls 
suggests that the latter exhibit greater specialization, due perhaps to long 
occupation in a restricted insular environment. The general difference between 
the two groups is not regarded as indicative of a difference in racial impurity. 

The maximum lengths of the three Eaglehawk Neck skulls of males occupy 
the first, second and fourth places, respectively, in the table of measurements 
(Appendix III) of the crania of male Tasmanian full-bloods. The maximum 
breadths of their vaults occupy the first, fifth and tenth positions, respectively. 
The female svecimen fills the second place among the skulls of the female 
Tasmanian full-bloods in the case of the maximum length, and, with three 
other female specimens, it shares the sixth place in the case of the maximum 
breadth of the vault. It is therefore apparent that the Eaglehawk Neck crania 
are within the recognized metrical limits of Tasmanian crania so far as size 
is concerned. 

Turner pointed out that in many Tasmanian skulls a part of the sagittal 
suture lies in a depression between two lateral ridges. This characteristic is 
well marked in the Eaglehawk Neck skulls, the depth of the depression being 
5 mm. in No. 80 of the Tasman series. Individual measurements of the femora 
and tibiae from Eaglehawk Neck are given in Appendix II. 

6. THr TASMAN SERIES OF CRANIA: METRICAL DATA 

Particulars of the crania in this series—their present locations, the related 
individual reference numbers, and the related racial and sexual classifications— 
are given in Appendix I. Measurements of the specimens accepted as representing 
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full-blood Tasmanians and also those of Tasmanian-Australian half-bloods are 
given in Appendix IIT. 

The basis on which the classification has been made in the present enquiry 
has already been mentioned. It is believed that the classification is generally 
reliable to the extent that it separates the crania of Tasmanian full-blood 
aborigines from those of other origin, whether full-blood or mixed-blood. As 
regards two specimens, which have been included with the skulls of the Tas- 
manian full-bloods, there is a small doubt, though the evidence is not considered 
sufficient to justify their exclusion. The recognition of crania of Australian 
full-bloods, and a majority of those classified as the remains of Tasmanian- 
European mixed-bloods, has presented no difficulty, but some uncertainty 
exists as to whether some of the latter skulls are remains of Tasmanian-European 
or Tasmanian-Chinese mixed-bloods. 

Seven out of eight mandibles unassociated with crania have been classified 
as remains of Tasmanian full-bloods, because there is not sufficient evidence to 
exclude them. One mandible has been classified as that of an Australian full- 
blood ; its rugged construction, large and greatly worn teeth, and the form of its 
dental arch all differ from the corresponding features in authentic Tasmanian 
mandibles. 

Thanks to Turner’s descriptions, most of the skulls have been easy to 
classify according to sex. A small minority proved difficult, but it was considered 
preferable to attempt to classify correctly each skull, rather than to relegate 
any to a group of specimens of unassigned sex. One or two regarded as the 
remains of female Tasmanian full-blood aborigines may be those of males: their 
anatomical characteristics indicate femininity, while their cranial capacity 
suggests masculinity. 

Of the 114 specimens in the Tasman series—all of which are in Common- 
wealth collections, it should be remembered—I took measurements of 101, the 
remaining thirteen being too fragmentary for the purpose. The individual 
readings for the fifty-eight adult skulls judged to be those of full-blood Tas- 
manians, and the eight adult skulls judged to be those of Tasmanian-Australian 
half-bloods are given in Appendix III. Of the total series of 114 specimens, 
Berry & Robertson have published measurements of fifty-two, Harper & 
Clarke of fifteen, Hrdlitka of seven, Wood Jones & Campbell of six and Ramsay 
Smith of three. 

Means derived from my measurements of the full-blood Tasmanian skulls in 
the Tasman series are compared in Table I with means given by Morant (1927) 
which were obtained by pooling the measurements provided by a number of 
earlier investigators.* It should be realized that the latter set is partly based 


[* Comparisons between the two sets of means are made in the Note by Dr Morant appended 
to Dr Wunderly’s paper.—Ed.] 
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on data for a few specimens which are not classed as full-blood Tasmanian by 
me, and also on a considerable number in European collections. 

Comparisons are made in Table II between a few means derived from my 
measurements and those given by other workers. The values given earlier 
relate partly to specimens in Commonwealth collections which I do not accept 
as full-blood Tasmanian (Harper & Clarke’s and Hrdlitka’s) and partly to 
specimens in European collections (all Turner’s and most of Hrdlitka’s). Most 
of the numbers are far too small to provide reliable means, but, nevertheless, 
a remarkably close agreement is found. It may be noted that the means for 
the very short Tasmanian-Australian mixed-blood and for the Australian full- 
blood series fall on the same side of all the other means in the case of the male 
and female cephalic and height-length and of the male nasal index. 


7. THE NON-METRICAL MORPHOLOGICAL CHARACTERS OF TASMANIAN CRANIA 


These characteristics, for material available in 1933, were recorded by 
Wunderly & Wood Jones (1933). Owing to the discovery of additional specimens, 
a revision of the data has been found necessary. In the present paper the 
characteristics are recorded only for the skulls of Tasmanian full-bloods (Tasman 
series, Section A). To make them more useful for purposes of reference, they 
have been recorded for each sex separately. The particulars in the former report, 
which are still applicable to all specimens now available, are not included in 
the present account. The directions given by Wood Jones (1929a) were again 
followed when recording the revised data. 


(i) Cranial form (fifty-seven crania of Tasmanian full-bloods) 

Reference has already been made to the cranial type of the Tasmanian skull, 
and to the two modifications in this type which have been observed in the 
present work. 

Norma verticalis. (a) The specialized form of skull, as seen particularly in 
the remains of the aborigines who died since the time of European discovery, is 
generally pentagonal, with “pronounced bosses situated far posteriorly on the 
parietal bones, and a relatively small minimum frontal breadth. The occipital 
t region is broad, and well rounded, but in some specimens it is small in area and 
prominent. The medio-lateral thickness of the zygomatic arch is remarkably 
small compared with that of the Australian”. 

(b) In the Eaglehawk Neck skulls and some others which resemble them 
fairly closely it is seen that the general outline form is not so markedly penta- 
gonal, the parietal eminences are not so acutely prominent, and they are not 
situated so far posteriorly. In short these skulls are more gently rounded, and 
they do not exhibit the features which may be termed “outline angularities” 
that distinguish the specialized Tasmanian skull. 
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TABLE I 
Mean measuremenis of series of Tasmanian skulls* 
| 
Male Female 
| Tasman | Pooled Tasman | Pooled 
series series series | series 
(Wunderly) (Morant) | (Wunderly) | (Morant) 
| 
Max. glabella occipital length (Z: 1) 185-4 (30) 182-2 (43) 177-9 (25) | 174-6 (20) 
Glabella-inion length (2) 180-0 (30) | 177-7 (36) | 172-3(25) | 166-3 (16) 
Maximum breadth (B: 3) 138-2 (27) | 136-0 (60) | 135-8 (25) | 132-4 (36) 
Max. frontal breadth (B”: 6) 111-0 (25) | 108-2 (24) | 108-4 (25) | 103-6 (10) 
Max. bimastoid breadth (7) 119-9 (26) — 116-7 (22) — 
Min. frontal breadth (B’: 5) 94-7 (25) | 94-0(62) | 92-9(27) | 90-1 (35) 
| Basio-bregmatic height (H’: 4a) 129-8 (24) 130-9 (55) 129-2 (22) 125-3 (35) 
Auriculo-bregmatic height 46) 114-1 (25) 111-6 (23) 
| Chord nasion-basion (LB: 9) 98-1 (22) 98-8 (55) 94-6 (22) 92-7 (34) 
| Chord prosthion-basion (10) 100:3 (9) — 97-1 (10) — 
| Length of foramen magnum (fml: 21a) 35-6 (22) 35-7 (53) 34-4 (22) 34-2 (31) 
Breadth of foramen magnum (fmb: 216) 29-0 (22) 29-6 (44) 29-0 (23) 28-4 (27) 
| Horizontal circumference (U: 23a) 515-8 (25) | 511-3(48) | 500-1 (23) | 489-5 (23) 
| Are nasion-bregma (S,: 22 (i)) 128-8 (27) 127-2 (44) | 125-1 (26) 121-3 (23) 
| Are bregma- lambda (S,: 22 (ii)) 131-0 (28) | 126-2(42) | 127-6(24) | 122-2 (23) 
| Arc lambda-opisthion (S,: 22 (iii)) 112-9 (23) | 111-8(37) | 110-5 (22) | 109-4 (16) 
| Are nasion-opisthion (S: 22) 370-3 (21) | 365-8 (36) | 364-3 (23) | 350-5 (15) 
| Broca’s transverse arc (23) 293-2 (25) | 290-2 (40) | 286-7 (23) | 283-5 (17) 
Chord nasion-alveolar point (G’H: 12) 62-4 (12) 62-5 (36) 61-1 (11) 59-9 (16) 
| Orbito-alveolar height (20) 38-8 (19) — 36-7 (16) — 
Bizygomatic breadth (J: 8) 130-4 (9) 131-0 (44) 126-6 (12) 122-0 (21) 
Flower’s interorbital breadth (15) 22-7 (25) 25-3 (20) 22-0 (21) 23-8 (13) 
Dacryal orbital breadth, R (O,’R: 16) 38-1 (19) } ai 36-9 (20) } mae 
Dacryal orbital breadth, L (O,’L: 16) 37-7 (19) 39-3 (40) 37-0 (18) 38-3 (18) 
| Orbital height, R (O,R: 17) 29-9 (19) ‘ais 30-9 (20) } - 
| Orbital height, L (O,L: 17) 29-3 (19) 31-05 (60)| 30.6 (20) 31-7 BD) 
| Nasal height (NH: 13) 45-1 (21) 47-1 (58) 44-7 (19) 44-9 (30) 
| Nasal breadth (NB: 14) 26-9(19) | 27-8(57) | 25:9(20) | 26-3 (29) 
| Width of alveolar border (18) 66-0 (14) - 63-9 (14) _- 
| Height of alveolar curve (18a) 60-7 (9) _- 57-1 (11) — 
Breadth of palate (G,: 39-5 (15) 37-6 (13) 
Length of palate (G,’: 19a) 49-6 (10) — 48-7 (11) — 
| Minimum thickness 4-4 (25) — 3-9 (22 — 
Maximum thickness 7-5 (25) — 6-7 (22) — 
Capacity (C: 24) 1247-1 (14) | 1264-3 (33) | 1242-8 (14) | 1153-8 (25) 
100 B/L 74-2(27) | 74:2(43) | 76-4(24) | 75-1 (19) 
100 H’/L 70-6 (24) 71-3 (37) 72-3 (22) 71-1 (19) 
100 B/H’ 105-8 (22) | 103-9(55) | 106-1(21) | 105-7 (34) 
100 B’/B 69-0 (24) — 68-7 (25) — 
100 0,/0,’, R 78-5 (19) |) 83:8 (20) cee 
106 L 77-8 (19) |f 7440) go.5 (ag) 833 (17) 
106 NB/NH 59-9 (19) | 59-1 (57) | 58-6 (19) 59-0 (29) 
106 fmb/fml 81-6 (22 82-1 (42) | 84:7(22) | 83-3 (26) 


* Measurements for which both male and female means of the Tasman series are based on 
fewer than ten skulls are omitted. See p. 335 for remarks on the definitions of the measurements, 


| 
a, 
| 
Be 
| 
| 
| 
| 
| 
| 


322 Cranial and other Skeletal Remains of Tasmanians 


TABLE II 


Mean measurements for Tasmanian and Australian. series of skulls 


| Tasmanian | Tasmanian . 
full-blood (Harper & 
| (Wunderly) | Clarke) | 
3 | Cephalic index (100 B/L) 742(27) | 740(6) | 725(8) 
Height index (100 H’/L) | 70:6(24) | 700(4) | 720(7) 
Orbital index (1000,/0,) 78:1 (19) 794 (6) 77-3 (7) 
: Nasal index (100 NB/NH) | 59-9 (19) 54-0 (6) | 59-8 (7) 
Capacity | 1247 (14) | 1282 (3) | 1235 (7) 
: | | Tasmanian- Australian 
= | Tasmanian | Australian full-blood in 
2 | (Hrdlitka) | mixed-blood | Tasman series 
| (Wunderly) (Wunderly) 
| | 
| Cephalic index (100 B/L) | 741(22) | | 70-4(3) 
Height index (100 H’/L) | --- 69-4 (5) 69-7 (4) 
Orbital index (100 9,/0,’) 80-3 (21) | 79-6 (5) 74-7 (4) 
Nasal index (100 NB/NH) 56-7 (20) 51-8 (4) 52-1 (4) 
Capacity 1285 (3) | 1261 (2) 
Tasmanian Tasmanian | 
full-blood (Harper & roa 
(Wunderly) | Clarke) wenet) 
4 |} Q Cephalic index (100 B/L) 76-4 (24) 77-0 (5) 74-2 (1) 
Height index (100 H’/L) 72-3 (22) 72-5 (4) 73-0 (1) 
: Orbital index (100 0,/0,’) 83-1 (18) 84-8 (4) 84-6 (1) 
. | Nasal index (100 NB/NH) | 58-6 (19) 55-2 (3) 61-0 (1) 
| | Capacity | 1243 (14) 1089 (5) 1260 (1) 
Tasmanian- Australian 
| ‘Tasmanian Australian | full-blood in 
- | (Hrdlitka) mixed-blood | Tasman series | 
| (Wunderly) (Wunderly) | 
4 
| Cephalic index (100 B/L) | (15) 73-1 (2) 73:3 (6) | 
| Height index (100 H’/L) 68-8 (2) 70-9 (5) | 
Orbital index (100 0,/0,’) 84-2 (15) 84-3 (3) 85-9 (6) 
Nasal index (100 NB/NH) 58-4 (15) 55-7 (3) 56-9 (6) 
Capacity ~ 1172 (2) 
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Norma lateralis. The Eaglehawk Neck specimens, viewed from this aspect, 
are seen to be more rounded than the specialized skulls. The temporal fossae in 
the Eaglehawk Neck skulls are usually a little fuller than in the specialized 
specimens, in some of which they are notably flat. 

Norma facialis. The facial margins of the orbit in the Eaglehawk Neck 
crania do not form such a pronounced rectangle as is seen in many of the 
skulls of specialized form. 

Norma occipitalis. The angularities seen in the irregular pentagonal outline 
of the specialized skull are not so noticeable in the Eaglehawk Neck specimens, 
although the general outline seen in the one form closely resembles in other 
respects that seen in the other form. The “depression” of the posterior one-third 
or one-half of the sagittal suture is deeper in the Eaglehawk Neck male specimens 
than in any other skulls in the Tasman series. 


(ii) Cranial asymmetry 


It is now possible to demonstrate the asymmetry of a skull graphically, 
and in a rough quantitative way, by means of a modified Schwarz drawing 
apparatus. 


(iii) Sutures (fifty-seven crania) 
The only alteration necessary with regard to this characteristic is in respect 


of the total number of crania for which the particulars are now applicable. 


(iv) Ossa suturarum (fifty-two crania) 


| 


| 
Males | Females | 
28 skulls | 24skulls | 
: | | 
Total number of ossicles 64 77 
Average per skull 2-3 
Skulls having ossicles: 
Bi-laterally in lambdoid suture 28% 37% 
Unilaterally in lambdoid suture 43% 29% 
| In occipito-mastoid suture 21% 16% 
At asterion 1% 33% 
| Percentage with ossicles in the 
Lambdoid suture S4 67 
| Right lambdoid 36 36 
Left lambdoid 36 30 
At lambda 12 1 


One female skull has an ossicle in the right half of the coronal suture; one 
female has one in the sagittal suture, and another has four in the same suture. 
The largest number of ossicles observed in any skull is thirteen in a female 
specimen. 
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(v) Pterion (forty-eight crania) 

The pterion is of normal contact bilaterally and of usual size in 28 % of the 
twenty-five male skulls, and in 22 % of the twenty-three female skulls; and wide 
in 8% of the male skulls, and in 4°, of the female skulls. The contact in 4°% 
of the male skulls and in 9% of the female skulls is seen to be normal on each 
side, but of the usual size on one side and narrow on the other. Epipteric bones 
completely occupy the pterion bilaterally in 12 % of the male and in 4% of the 
female skulls. Two female crania exhibit the pithecoid contact on each side. An 
epipteric bone unilaterally accompanied by a normal contact of usual size 
appears in 20°, of the male and in 30% of the female skulls. In one female 
skull the normal contact of usual size is associated with a normal, wide contact 
on the other side, and in another female specimen the contacts on each side are 
fused. 

The pterion of the side remaining in skulls in which the parts of the other 
side are lost through damage is found to be as follows: 

(a) normal and of usual size in four male and one female skull, 

(6) normal and narrow in one male specimen, 

(c) by epipteric bone in one male skull. 


(vi) Epipteric bones (forty-eight crania) 


These bones were found bilaterally in 12% of the twenty-five male skulls 
and in 4°%, of the twenty-three female skulls. They were observed unilaterally 
on the left in 24° of the male and in 30°% of the female skulls. One female 
specimen has an epipteric on the right only, but this condition was not seen in 
any male skull. 


(vii) Supra-orbital foramina, notches or grooves (fifty-eight crania) 


Bilateral grooves, some being shallow, were found in 39 % of the thirty-one 
male skulls; a foramen on one side and a notch on the other in 13%; a groove 
with a small accessory foramen bilaterally in 13°; a groove on one side and a 
notch on the other in 10°; and a notch and an accessory foramen bilaterally 
in 6%. One skull has a notch bilaterally, while another has a groove on one 
side and on the other a groove with an accessory foramen. In four skulls in 
which the parts on one side are missing, the other exhibits a notch in three 
specimens and a groove in one. 

In the twenty-seven female skulls 30 °% have a groove bilaterally, and 30%, 
a notch bilaterally. Each of the following conditions was observed in each 
specimen in four different groups consisting of two female skulls each: 

(a) a foramen on one side and a notch on the other, 

(b) a notch and an accessory foramen bilaterally, 
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(c) a groove and an accessory foramen on one side and a notch and an 
accessory foramen on the other, 


(d) a notch on one side and a notch and an accessory foramen on the other 


(viii) Anterior ethmoid canal (thirty-four crania) 


In the male skulls it was found bilaterally in the suture in 58 % of cases, and 
in the frontal bone and independent of the suture in 10%. In five male specimens 
in which the parts of one side are lost through damage, the canal is in the suture 
of the other side. In one male skull it was seen bilaterally in the frontal bone 
and confluent with the suture. 

In 80 % of the female skulls it is situated in the suture bilaterally, and in one 
specimen only it is in the frontal bone bilaterally and independent of the suture. 
In one female specimen it was found in the frontal bone on one side and in the 
suture on the other. In one specimen in which the parts of one side had been lost 
the canal on the other side is in the suture. 


(ix) Sutures of the inner wall of the orbit 


An abnormal arrangement of these sutures was not observed in any skull 
classified as that of a Tasmanian full-blood. 


(x) Spheno-mavillary fissure (thirty-eight crania) 


This fissure was classified as narrow in 37 % of the male and in 21 % of the 
female skulls; as of moderate width in 53 % of the male and 63 % of the female 
specimens, and as wide in 10% of the male and 16 % of the female crania. 


(xi) Form of the orbit 


The only particulars regarding the Tasmanian orbit which need be added to 
those already published are those concerning the orbit of the Eaglehawk Neck 
group of crania. The markedly rectangular form of the orbit applies to the 
specialized Tasmanian natives, but its form in the crania of the aborigines who 
are believed to have been some of the earliest inhabitants of the island was less 
noticeably rectangular. 


(xii) Infra-orbital foramen (forty-four crania) 


It was decided to classify the foramina separately from the independent 
sutures. A single foramen of usual size was found bilaterally in 50% of the 
twenty-two maleand alsoin 50% of the twenty-two femaleskulls. Asingle foramen 
on oneside, and a single foramen accompanied by asmall accessory foramenon the 
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other side, was found in 36% of the male, and in 14% of the female crania. 
The following conditions were found in the numbers of specimens indicated : 


Males Females 
(a) Single and an accessory foramen bilaterally 1 1 
(6) Parts on one side lost, remaining side shows single 
normal foramen 1 4 
(c) Double foramen bilaterally 1 —- 
(d) A single normal foramen and two accessory foramina on 
one side, and a single foramen and one accessory foramen 
on the other _ 1 
(e) Asingle normal foramen and two accessory foramina on 
one side, and a single normal foramen on the other — 1 
(f) Parts on one side lost, the remaining side exhibits a | 
| single normal foramen and one accessory foramen — 1 


A complete independent suture from a foramen to the orbital border was 
found bilaterally in three male and nine female crania. In four female skulls it 
is present unilaterally, and a complete suture on one side and an incomplete 
suture on the other was observed in one female specimen. 


(xiii) Form of the jugal 
The remarks already published still hold good. 


(xiv) The nasal bones (forty-seven crania) 


The nasal bones were found to be normal and symmetrical in 36% of the 
twenty-five male crania, and in 32% of the twenty-two female specimens; 
normal and asymmetrical in 20 °% of the males and 23 % of the females; narrow 
and symmetrical in 28° of the males and 14°% of the females; narrow and 
asymmetrical in 4°, of the males and 9 % of the females; wide and symmetrical 
in 8% of the males and 18% of the females; and wide and asymmetrical in 
4°%, of the males and 4°, of the females. The internasal suture is fused in one 
and partly fused in three male skulls. 


(xv) The narial aperture (thirty-nine crania) 


The specimens in which the lateral margins of the aperture are “almost 
parallel-sided” were found to be unauthentic. 


(xvi) The nasal septum 


This is present in only one male skull, in which it is slightly deflected, and 
in three female crania, in two of which it is normal and in the other deflected. 
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(xvii) The foramen ovale (fifty crania) 


The average size of this foramen was found to be from 5 to 6 mm. long, by 


3 mm. wide. The following conditions were found in the twenty-seven male and 
twenty-three female skulls: 


| Males | Females 
— — 
(a) Foramen complete and of average size | §9% 39° 
| Specimens 
(b) Complete and small bilaterally 3 2 
(c) Incomplete and confluent with the foramen spinosum 
on one side, and complete and of average size on the 
other 1 2 
(d) Incomplete and confluent with the foramen spinosum | 
bilaterally 3 
(e) Incomplete on one side and complete and of average 
size on the other 2 1 
(f) The parts on one side are lost, and on the other the 
foramen is complete and of average size 2 2 | 
(g) Ditto and the foramen on the remaining side is in- | 
complete 1 | 


The following conditions were observed once in female skulls: 

(a) the parts on one side are lost and on the remaining side the foramen is 
incomplete and confluent with the foramen spinosum, 

(b) ditto and on the remaining side the foramen ovale is complete and round, 

(c) complete and round bilaterally, 

(d) complete and round unilaterally, and on the other side the foramen is 
incomplete and confluent with the foramen spinosum. 


(xviii) The foramen of Vesalius (thirty-seven crania) 


In the seventeen male skulls the foramen is present and complete bilaterally 
in 41°; absent bilaterally in 12%, and present and complete unilaterally in 
47 °%,. In the twenty female crania it is present and complete bilaterally in 50% ; 
absent bilaterally in 25°, and present and complete unilaterally in 20%. In 
one specimen it is present and incomplete bilaterally. 


(xix) Foramen spinosum (forty-nine crania) 


Only the external orifice of the canalis spinosus was examined. In the 
twenty-six male skulls it is complete bilaterally in 35%, and unilaterally in 
31%, while in 15 % it is incomplete bilaterally. In four skulls in which one side 
has been lost the foramen on the remaining side is complete in three and in- 
complete in one. This incomplete foramen is confluent with the foramen ovale, 
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and one of the complete foramina has a double orifice. The double orifice is seen 
in two male skulls, and in four the confluence between the foramen spinoswm 
and the foramen ovale is noticeable. 

In the twenty-three female crania it is seen to be complete bilaterally in 30 %, 
and unilaterally in 26%, while in 22 % it is complete bilaterally. In five skulls 
in which the parts of one side are lost the foramen on the remaining side is 
complete in three and incomplete in two. In three specimens the incomplete 
foramina are confluent with the foramen ovale, while in another three the 
complete foramina are situated high on the spina angularis. One skull exhibits 
a double orifice bilaterally. 
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(xx) Spina angularis sphenoide (fifty crania) 

In the twenty-seven male skulls it is short and blunt bilaterally in 44%%, 
and short and blunt one side and short and sharp on the other in 18%. Each of 
the following conditions was observed twice in the male skulls: 

(a) short and sharp bilaterally, 

(6) conical bilaterally. 

One specimen exhibits a blunt arrow-head bilaterally and another skull 
possesses a long sharp arrow-head bilaterally. In four skulls in which the parts 
on one side are lost the spina on the other side is short and blunt in two, short 
and sharp in one, and conical and sharp in one. 

In the twenty-three female crania it is short and blunt bilaterally in 43%, 
and short and sharp bilaterally in 13°. In two specimens it is short and blunt 
on the one side, and short and sharp on the other. One specimen has a broad 
and flat spina bilaterally and another a short bifid spina bilaterally. 

In each of five skulls in which the parts of one side are lost the other side 
exhibits the following conditions: 

(a) short and sharp spina, 

(b) the spina consists of a high ridge, 

(c) short and blunt, 

(d) sharp arrow-head, 

(e) long and sharp. 

In one specimen a short sharp arrow-head is seen on one side and a sharp 
bifid spina on the other. 


(xxi) Laminae pterygoidei (forty-four crania) 

The attached margin of the lateral laminae fades away as a ridge close to the 
anterior margin of the foramen ovale in 64°, of the twenty-five male skulls. In 
12% it fades laterally, and in 8%, medially to the foramen. It is medial to the 
foramen bilaterally in one skull. In each of three crania in which the parts on 


one side are lost, the other side shows the ridge ending at the anterior margin 
of the foramen. 
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In 74 % of the nineteen female crania the ridge ends anterior to the foramen. 
Each of the following conditions was observed once: 

(a) the ridge is lateral to the foramen bilaterally, 

(b) the ridge is medial to the foramen bilaterally, 

(c) the ridge is medial on one side and lateral on the other, 

(d) the ridge ends at the anterior margin of foramen ovale on one side, and 
is lateral to the foramen on the other, 


(e) the parts on one side are lost, and on the other side the ridge ends 
anterior to the foramen. 


(xxii) The jugular fossa and foramen (thirty-nine crania) 

In 62 % of the twenty-one male skulls the foramen on the right is larger than 
that on the left; in 14°% they are equal in size, and in 24% the left foramen is 
larger than the right. In the eighteen female crania the right foramen is larger 
than the left in 72°; they are equal in size in 17 %, and in 11 % the left foramen 
is larger than the right. 


(xxiii) The tympanic region (forty-eight crania) 

In 63 % of the twenty-seven male skulls the conditions found were classified 
as normal bilaterally. Exostoses in the external auditory meatus were observed 
bilaterally in 11%. The floor of the mouth of the external auditory meatus was 
regarded as thick bilaterally in two skulls, and thin bilaterally in two others. 
The following conditions were observed once in different specimens: 

(a) the margins of the meatus are rough bilaterally, 

(b) a rough ridge occurs on the lower surface bilaterally, 

(c) the parts of one side are lost and the other side exhibits a ridge higher 
than the spina angularis. 

Greater variability was seen in the female crania. In 29 % of the twenty-one 
female specimens the conditions were classified as normal bilaterally and in 
29 %, the margins of the meatus are rougher than usual bilaterally. The bone of 
the floor of the meatus is thick bilaterally in 14 °%, and thin bilaterally in 14%. 
In one skull exostoses in the external meatus bilaterally were observed. One 
cranium shows exostoses unilaterally, and on the other side the conditions are 
normal. A rough ridge on the lower surface is exhibited bilaterally in one skull. 


(xxiv) Foramen of Huschke 
Not observed in any skull. 


(xxv) Styloid process (twenty-five crania) 

In the thirteen male crania it is short and small in cross-section bilaterally 
in 69% of cases, and rudimentary in two specimens. It was recorded as being 
short and thick bilaterally in one skull. In one specimen in which the parts on 
one side are lost the process on the remaining side is rudimentary. 
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In 66% of the twelve female skulls it is short and small in cross-section 
bilaterally, and in two specimens it is rudimentary bilaterally. In one skull it 
is short and thick bilaterally, and short and small in section unilaterally in one 
specimen in which the parts on one side are lost. 


(xxvi) T'he posterior condyloid foramen (forty-one crania) 
In the twenty male crania it is present unilaterally, on the right side, in 
40 °%, of cases; bilaterally in 20°; unilaterally, on the left side, in 5%; and 
absent from both sides in 35%. It is present bilaterally in 29 % of the twenty- 
one female skulls, present unilaterally in 42°, and absent bilaterally in 29 %. 


8. SUMMARY 

The author has investigated, during the past eight years, the physical 
characteristics of the extinct Tasmanian aborigines. The present article is a 
report on the anatomical aspects of the enquiry. 

The remains of the Tasmanians consist chiefly of crania and a small number 
of other bones. A critical anatomical examination of the crania claimed to be 
of Tasmanian origin and contained in collections in the Commonwealth of 
Australia, together with reliable collateral evidence, reveals that some of the 
specimens are not authentic. The basis of the anatomical diagnosis of their 
racial origin is described. 

A critical examination is made of several reports published since 1898 and 
their values are assessed. 

The crania examined are numbered in a series known as the Tasman series, 
their numbers being related to those allotted in other enquiries. The classifica- 
tion of racial origin has been tabulated (Appendix [) to show its relation to that 
adopted by each of several other enquirers. 

Special reference is made to osteological remains found at Eaglehawk Neck, 
and the probable significance of their particular characteristics is discussed. 

The non-metrical morphological characteristics of the crania classified as 
those of Tasmanian full-bloods are recorded. 

Turner gave such a comprehensive account of Tasmanian craniology that 
little can be added to it as the result of the present enquiry. He inferred that the 
‘Tasmanians were direct descendants from a primitive Negrito stock which had 
migrated across Australia. He also considered that they had become specialized 
in many ways as the result of long isolation. The observed intra-racial differences 
between the Eaglehawk Neck, or ‘“‘old”, type of crania and the “recent” type 
may indicate progressive specialization, resulting from long occupation in a 
restricted environment. These differences constitute the only anatomical evi- 
dence found, during this enquiry, which has a bearing on the length of time 
during which the Tasmanians inhabited their island. The extent of the differences 
seems to point to the probability of a lengthy time period. 
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During the present enquiry no instance of cranial deformity of any kind, or 
customary tooth extraction, has been observed in any skull classified as that 
of a Tasmanian full-blood. Neither dental caries nor any other pathological 
condition was noted in skulls of Tasmanian full-bloods who lived in the natural 
state prior to contact with civilized people. The palate of the Tasmanian full- 
blood is wide and only moderately high, and in many cases it has a well-defined 


maxillary palatine torus. The teeth of the Tasmanians are smaller than those 
of the Australians. 


My thanks are due to all who have permitted me to examine the Tasmanian 
remains which are in their charge, and to the University of Melbourne for 
financial assistance. I am particularly indebted to Dr G. M. Morant for 
reviewing and correcting the typescript and arranging it for publication; to 
Mr D. J. Mahony, M.Sc., Director of the National Museum, Melbourne, for 
revising the manuscript; to Dr E. Ford, Lecturer in Anatomy in the University 
of Melbourne, for checking the measurements of the limb bones; to Acting 
Professor M. H. Belz, of the University of Melbourne, for carrying out some 
of the calculations, and to Mr W. H. Preston for photographing the skulls. 
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APPENDIX I 
The Tasman series of skulls 


At the commencement of the examination of specimens catalogued as Tasmanian in 
Commonwealth collections it was found to be impossible to identify readily a majority of 
them from the descriptions previously published, because of unsystematic and individual 
methods of numbering. It was decided, therefore, to label each skull ‘‘Tasman”’ series with 
a new serial number. The numbers from | to 52 in this series are the same as those allotted 
by Berry & Robertson (1909a). The Tasman series is made up by the remains of 114 adult 
and juvenile individuals represented by complete or incomplete skulls, and in some cases 
by mandibles only. The following particulars are given in the lists and folding table 
(Appendix ITT) below. 

(i) The collection in which each specimen is preserved at present and its number in this 
collection. The abbreviations used are: Public collections. A.M.S. = Australian Museum, 
Sydney; I.A.C. = Institute of Anatomy, Canberra; M.C.D. = Municipal Council, Devon- 
port; N.M.M. = National Museum, Melbourne; Q.V.M. = Queen Victoria Museum, Laun- 
ceston; S.A.M.= South Australian Museum, Adelaide; T.M.H. = Tasmanian Museum, 
Hobart; U.M. = University of Melbourne. Private collections. A.L.M. = A. L. Meston, Esq., 
M.A., Launceston; G.R. = Gilbert Rigg, Esq., Melbourne; W.I.C.= Dr W. I. Clark, 
Hobart; W.L.C. = Dr W. L. Crowther, Hobart; H.A. = Howard Amos, Esq., Cranbrook, 
Tasmania. 

(ii) The number allotted to each specimen in previously published papers. The sources 
are given in the references above and the abbreviations denoting authors used are: B. & 
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R. = Berry & Robertson; H. & C. = Harper & Clarke; H. = Hrdlitka; 8S. = Remsay Smith; 
W. J. & C. = Wood Jones & Campbell. 

(iii) The sexes of the specimens. The sexes given are those decided on by the writer 
after examination of the skulls, and unless otherwise indicated these are the same as those 
adopted by the earlier investigators of the material. Juvenile skulls and isolated mandibles 
are not sexed. Remarks on sexing are given on p. 319 above. 

The essential aim of the enquiry described in the present paper was to distinguish be- 
tween the skulls which are, and those which have been alleged to be but which in fact are 
not, remains of full-blood Tasmanians. This question is discussed fully in the text above. 
The following eight groups were distinguished and all the 114 specimens are assigned to 
one or other of these: 

Section A. Tasmanian full-blood : sixty seven specimens. Particulars and measurements 
of the thirty-one adult male and twenty-seven adult female skulls assigned to this group 


are given in the table of individual measurements (Appendix ITI). The following specimens 
are also included in it: 


| | | | 

| By | Description | Museum No. | Collection | 

| | | | | 

| 82 | Juvenieskul | | TMH. | 

90 | Lag | 
70 | Mandible A.16541 | SAM |! 

97 = | A. 580 5 

98 | A.2210 

100 23389B | N.M.M. 

| | ALM. 

| | 


In the present state of our knowledge, it is not possible to distinguish at all accurately 
between mandibles of Tasmanian full-bloods and those of Tasmanian-Australian mixed- 


bloods, and hence the allocation of these eight mandibles to the Tasmanian full-blood group 
is particularly uncertain. 


Section B. Australian full-blood : twelve specimens. 


No. in | | | 


Tasman | Description | Museum No. Collection No. in earlier papers | 
series | 
25 Adult 3 skull 1201 Q.V.M. ‘| B.& R. 25 | 
59 | T(d) LAC. 
112 = 19 W.L.C — 
113 = — Q.V.M. 
49 Adult 2 skull 12 | W.L.C. B. & R. 49 
51 = 12922 N.M.M B. & R. 51 
63 o A. 577 S.A.M W. J. & C. 577; H., A. 577 
| A. 1649 T.M.H 
| 87 | 15 W.L.C 
88 ” 16 ” 
68 Juvenile skull | A. 16539 8.A.M. — 


64 | Mandible 


The three of these skulls described by Berry & Robertson were accepted as Tasmanian 
by them. No. 63 was accepted as Tasmanian by Wood Jones & Campbell, and Hrdlitka 
Biometrika xxx 
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describes it as being of Australian type. Nos. 63 and 68 were found on the west coast of 
Tasmania. 
Section C. A male skull (Tasman series No. 114) in the National Museum, Melbourne, 
which is apparently that of an individual who had no Tasmanian or Australian ancestors. 
Section D. Tasmanian-European mixed-blood: seven skulls. 
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No. in 
Tasman Sex Museum No. Collection No. in earlier papers 
series 
56 3 A. 2228 T.M.H. — 
11 4292 H. & C. 12; B. & R. 11 
14 Q 4290 > H. & C. 34; B. & R. 14 
15 4295 B. & R. 15 
16 4296 B. & R. 16, 
52 Q 129974 N.M.M B. & R. 52 
105 9 H.A. 


Five of these skulls described by Berry & Robertson were accepted as Tasmanian 
by them. No. 11 was accepted as Tasmanian by Harper & Clarke and they group No. 12 
as half-caste. 

Section EZ. Australian-European mixed-blood : three female skulls. 


No. in N ° in 
Tasman Museum No. Collection aiilion papers 
series 
55 23389 N.M.M. oo 
96 1221 Q.V.M. 
101 B. 3496 AMS. S., A, 


No. 101 was accepted as Tasmanian by Ramsay Smith. 

Section F. Tasmanian-Australian mixed-blood: nine skulls. Particulars and measure- 
ments of eight of these are given in Appendix III. The other is a male specimen (Tasman 
series No. 71) in the Tasmanian Museum, Hobart, where it is numbered 11509. Five of 
the skulls (Nos. 54, 65, 66, 67 and 109) were found at the northern end of the west coast, 
and three others (Nos. 29, 30 and 31) about 80 miles distant on the north coast. 

Section G. Apparently skulls of individuals of mixed blood with no Tasmanian or 
Australian ancestry. 


No. in 
Tasman Sex Museum No. Collection No. in earlier papers 
series 
13 3 4297 T.M.H. H. & C. 24; B. & R. 13 
12 4302 H. & C. 14, 3; B. & R. 12 


According to Harper & Clarke both these skulls are those of half-castes, and Berry & 
Robertson accepted them as Tasmanian. 
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Section H. Skulls which have been lost, or which are too fragmentary to yield reliable data: 
thirteen specimens. 


No. in 
Tasman Museum No. Collection No. in earlier papers 
series 
21 1572 T.M.H B. & R. 21, 2 
22 A. 506 = B. & R. 22, 9? 
23 A. 507 sy B. & R. 23, 3? 
24 ua: (Lost) B. & R. 24, 3 
39 — (Lost) B. & R. 39, 3 
41 4 W.L.C B. & R. 41, 2 
47 10 oe B. & R. 47, 
48 11 a B. & R. 48 
60 T (n) L.A.C — 
72 11554 T.M.H — 
99 D. 607 5 
104 1254 AMS. 8., C 
108 T.M. 1644 T.M.H 


Seven of the eight of these skulls described by Berry & Robertson are accepted by them 
as Tasmanian, the other (No. 48) being supposed half-caste if not Tasmanian. No. 104 is 
classed as “Tasmanian?” by Ramsay Smith. No. 3 in Harper & Clarke’s list is lost and it 
is not included in the Tasman series. 

The cranial measurements in Appendix III were obtained by following the definitions 
of the International (Monaco) Agreement of 1906, a translation of the report being given by 
Hrdlitka in his Anthropometry (1920). The numbers of this list are given and also the letters 
denoting the measurements customarily employed in craniological papers in Biometrika. 
The additional measurements of the “‘minimum”’ and ‘‘maximum”’ thickness of the left 
parietal were obtained by following Hrdlitka’s instructions (op. cit. p. 107). These are: 

“Introduce one branch of compass into the cranial cavity, apply to anterior part of 
the lower portion of the parietal approximately 1 cm. above the squamous suture, bring 
other branch in contact with the bone externally, and pass backwards at about the same 
distance from the sutures, watching the scale orf the instrument. Record observed minimum 
and maximum.” 

The cranial capacity was determined with fine spherical seed and this was packed as 
tightly in the skull as in the glass measuring cylinder, as far as could be told. 


APPENDIX II 
Limb bones found at Eaglehawk Neck 


An account of the discovery of this material and a description of the skulls are given 
in Section 5 of the text. Considering the scarcity of Tasmanian limb bones, the discovery 
of a number of them at Eaglehawk Neck is of importance. Nine femora and six tibiae 
are sufficiently well preserved to provide reliable data. Unfortunately it is impossible to 
identify, with certainty, any two or more bones as having belonged to one individual. For 
this reason, the femora were given numbers and the tibiae letters. The bones of the upper 
limb were badly damaged at the time of unearthing, and are unsuitable for reliable 
measurements. 

The measurements of the femora and tibiae, hitherto unrecorded, were made in accord- 
ance with the directions supplied by Wood Jones (1929a). They are: 

Femur. 1. Maximum length. 2. Oblique length. 3. Maximum trochanteric length. 
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4. Oblique trochanteric length. 5. Antero-posterior diameter. 6. Lateral diameter. 7. Cir- 
cumference of shaft. 8. Subtrochanteric transverse diameter. 9. Subtrochanteric. antero- 
posterior diameter. 10. Maximum diameter of articular surface of head. 11. Minimum 
diameter of articular surface of head. 12. Epicondylar breadth. 13. Condylar breadth. 

Tibia. 1. Maximum length. 2. Direct length. 3. Axial length. 4. Breadth of the 
condyles. 5. Antero-posterior diameter of shaft. 6. Transverse diameter of shaft. 7. 'Trans- 
verse cnemic diameter. 8. Sagittal cnemic diameter. 9. Antero-posterior diameter at level 
of tuberosity. 10. Transverse diameter at level of tuberosity. 11. Minimum circumference 


of shaft. 
Measurements of femora 
| No. of measurement 
No. No. | 
| | | | | | | | 
| | 
1 | A(E.H.)756; 480 | 478 | 467 | 455 | 33 | 31 |102 | 42 | 31 | 48 | 42 | 84 | 78 
2 762 | 470 | 466 | 458 | 445 | 31 | 27 | 93 | 37 | 26 | 45 | 42 | 82 | 78 
3 757 | 459 | 456 | 445 | 437 | 32 | 28 | 94 | 38 | 30 | 46 | 41 | 78 | 73 | 
4 763 | 457 | 455 | 444 | 435 | 33 | 30 96 | 38 | 29 | 47 | 44 | 79 | — | 
5 759 | 436 | 435 | 430 | 425 | 31 | 27 | 90 | 38 | 31 | 43 | 40 | 74 | — | 
6 758 | 415 | 413 | 406 | 395 | 28 | 22 81 | 35 | 25 | 41 | — | 72 | 68 | 
7 761 491 | 487 | 475 | — | 35 | 31 |105 | 43 | 31 | 48 | 44 | 84 | 80 | 
8 766 \-|-|-|-|3]2 30 | 37 | 26 | 40 | — | — | — | 
9 7644 | — | — | — | — | 31 | 28 40 | | 43 | 40 | — | — 
Mean | 458-3 | 455-7 | 446-4 | 432-0 | 31:3 | 27-4 | 92-3 | 38-7 | 29-0 | 44-6 | 41-9 | 79-0 | 75-4 
| Serial _ Length | ~ Thickness Platymeric | Head-length | Condyle-length | 
No index (7/1) | index (5/6) index (9/8) | index (10/1) index (13/1) | 
| 
1 21:3 106-5 73-8 | 10-0 | 16-3 
2 19-8 | 114-8 | 70-3 9-6 16-6 | 
3 20-5 114-3 | 78-9 | 10-0 15-9 
21-0 110-0 76-3 10:3 
5 20-6 114-8 81-6 | 9-9 — 
6 19-5 127-3 71-4 9-9 16-4 
7 21-4 112-9 | 72:1 | 9-8 16-3 | 
9 110-7 80-0 
| | 
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No. in earlier papers 


I 4288 T.M.H. | H.& C.1; B.& R.1 | 184 180 136 122 94 126 

3 4300 a H. & C. 4; B. & R.3 » | 193 193 141 116 126 98 — 

i 438 A. 1648 ie B. & R. 18 Rep ae 172 146 112 119 96 127 
27 1203 Q.V.M. | B. & R. 27 » | 180 173 — 114 120 88 -~ 

} 28 1204 ‘a B. & R. 28 BY 185 180 144 III 117 95 135 
Be I U.M. B. & R. 32 bo 1 Ree 172 135 105 109 95 123 
{ 33 2 a B. & R. 33 ae ie 185 142 112 126 97 136 
} 4 »” B. & R. 35 » | 179 172 131 107 115 99 127 
} 36 5 a B. & R. 36 ss 183 176 135 107 115 95 125 
)} 40 3 B. & R. 40 > 1c wee 178 133 108 114 88 130 
\ 43 6 He B. & R. 43 Sac 185 137 117 121 97 133 
44 I W.LC. | B. & R. 44 » | 188 178 139 III 125 99 138 

45 8 W.L.C. | B.& R. 45 177 171 120 128 

46 9 z B. & R. 46 » | 186 179 140 113 118 99 136 

57 18 » | 4176 170 136 107 gI 

61 I G.R. _- se 187 181 136 IIo 121 95 132 

69 | A.16540(P4) | S.A.M. | W.J.&C., P. 4; H., D 184 136 112 125 131 

74 A. 499 188 137 128 93 129 

79 A. 556 ” — » | 196 193 139 — 132 100 = 

80 A. 557 » | 199 | 197 143 119 | 122 

83 A. 2876 = — as 175 173 135 1or 120 86 126 

86 14 W.L.C. 185 175 136 119 88 12t 

89 17 198 187 145 121 105 140 

oI Q.V.M — 17I 168 132 105 IIo 89°5| 126 

94 1219 183 176 136 II3 96 130 

95 1220 ss 181 136 Ill 117 94°5| 138 
107 H.A. 179 140 117 93 123 
iro L. 2/131 T.M.H — = 190 183 138 113 126 130 

2 4291 T.M.H. | H. & C. 2, 3; B.& R. 2,3 | 9 193 190 143 112 122 101 130 

4 4301 H. & C. 5,3; B. & BR. 4,3 | »» 180 173 138 107 118 90 134 

5 4298 H. & 3; B.& R.5,3],, | 180 178 137 109 124 92 

6 4287 is H. & C.7; B. & R.6 a 170 167 131 103 113 88 132 

7 4293 be H. & C.8; B. & R.7 “i 173 167 134 102 117 86 118 

8 4289 B H. & C.9; B. & R.8 » | 175 174 132 117 112 97 125 

9 3362 iat H. & C. 10; B. & R.9 “ 176 173 132 107 112 90 126 

4294 H. & C. 11, 9?; B.& R. 10 | ,, 17! 165 133 106 86 124 

17 4303 B. & R. 17 137 107 116 95 

19 A. 1646 B. & R. 19, 3 173 112 — 97 135 

20 A. 1647 # B. & R. 20 ve. 280 173 137 107 117 96 132 

26 1202 Q.V.M. | B. & R. 26 an 179 169 139 106 113 93 123 

34 3 UM. | B.& R. 168 138 114 123 go 128 

37 6 a B. & R. 37, 3 » | 169 165 124 99 = 88 = 

38 7 = B. & R. 38, 3 “. 183 172 144 119 124 102 132 

42 5 W.L.C. | B. & R. 42, 3 ood > Sas 170 131 -—— 118 93 125 

5° 13 ” B. & R. 59, 3 ” 95 

53 12997B N.M.M. a eh 181 175 138 108 112 98 134 

58 T ()) L.A.C, == ER 183 180 142 109 117 93°5| 131 

62 A. 576 S.A.M. | W. J. & C. 576 oe ee 175 143 112 126 93 132 

75 A. 551 T.M.H. — iss 166 163 129 106 105 87 — 

76 A. 552 . — » | 175 173 133 108 —_— 89 129 

78 A. 555 188 178 138 106 96 132 

92 1212 Q.V.M. 177 170 136 108 113 93 129 

93 1218 181 179 133 117 95 135 
102 8. 404 AMS. | S., B oord eae 166 137 110 120 92 132 
106 H.A. » 1 378 172 135 104 III 125 
30 — M.C.D. | B. & R. 30f 3 | 186 183 131 103 117 94 125 

54 28899 N.M.M 189 188 136 95 137 

| 65 | A.16536(Pr) | S.A.M. | W.J.&C., P.1; H., Ct 1. 303 183 138 106 — — a 
66 | A. 16537 (P2) a W. J. & C., P. 2; H., BY o> | 204 202 142 108 127 95 135 
109 A, 22275 i H., A§ és 191 182 143 III 118 95 132 
29 1205 Q.V.M. | B. & R. 290, d+ Q 196 Ig! 14! 121 120 115 134 

M.C.D.:| B. & R. 31+ 177 126 102 90 123 

A. 16538 (P3) | S.A.M. | W. J. & C., P. 3; H., Et | 182 176 135 105 89 126 


Museum collection) L:1 pe B:3 | B’:6 B’:5 | H’:40 
series 
| 
| 
| 
| 
| 
| 
an 
* See p. 335 for an explanation of the letters and numbers denoting measurements and pp. 332-3 for a 
| 


Individual measurements of adul 


|Prosth.| fml: | fmb: S,: : | Broca’s WH: 
LB:9 GL:10 U:230 22 (i) 22 (ii) | 22 (iii) 8:22 Q’:23 GH:11 12 
97 105 35 29 512 123 125 122 370 280 —- 68 
95 100 32 26 520 135 132 116 383 310 — 57 
99 98 37 30 520 133 126 116 375 304 Se 65 
93 —- 31 27 500 132 125 105 362 289 — — 
104 _ 39 30 520 126 131 116 373 298 —_— 66 
97 —_ 36 29 493 127 127 102 350 286 — — 
99 509 126 124 290 59 
IoI 109 38 31 528 140 130 108 378 304 — 67 
100 38 30 518 130 | 137 381 298 | 
94 — | 34 30 117 132 114 363 280 | — 
96 — 32 28 520 138 136 120 394 | 305 104 — 
= = — = 495 121 132 106 359 | — — _— 
98 _- 34°5| 32°5] 523 130 130 118 378 292 — — 
40 30°5| 520 120 130 114 364 | 300 
35 24°5| 512 128 130 114 372 287 
532 130 130 — 285 — 
89 94 36 29 488 { 118 122 110 350 277 423-4 62 
98 107 36°5| 29 Six | 120 117 362 283 61 
III — 36 29 542 | 128 139 123 390 | — _— — 
9I 90 35 28 480 117 131 102 350 270 os 56°5 
98 99 36 30 511 128 135 110 373 295 —_ 62 
101-5} 101 35 29 508 135 130 105 370 303 — 61 
_ ae 34 29 516 130 130 113 373 283 _— _ 
101 — — 527 131 139 302 
95 = 35 28:5; — = 114 
102 IIo 35 29 540 133 137 120 390 300 _ 73 
100 32 33 506 121 135 368 287 — 
— 508 126 130 106 362 289 
90 94 32 26 480 118 | 132 114 364 280 99 61 
90 92 33 2 488 119 | 120 104 343 270 — — 
gI — 34 2 500 125 | 128 107 360 282 —- |j— 
88 — | % 30 | 488 127 117 355 277 
92 28 478 120 122 102 344 281 
131 | 131 mmr | 373 | — | 
S$ | — | 3 | 29 | 504 123 130 III 364 | 288 | — | — 
93 I0oo0 | 38 27 506 123 I22 | 114 359 | 285 — | 62 
90 498 124 | 130 106 360 204 |j — 
98 — 37 33 520 127 | 136 | x10 373 307 — | _— 
94 34 | 28 | 490 | 115 352 | 285 |— 
99 — 33°5| 30 | 505 126 | 127 | 117 370 300 — |— 
95 96 a oe 515 133 136 | 112 381 295 Ior | 61 
99 97 32 | 27 518 133 | 125 | 113 371 308 
= | 47 116 Iz | — 273 | — | 46 
95 97 36 30°5; — 123 124 106 353 —a = | 59 
98 34 | 27 513 128 12? | 120 375 
93 93 37 | 29 | 493 128 127 107 362 286 | — | 58 
97 Ior_ | 36 | 30 509 131 134 | 103 368 284 | — | 66 
9r | of 37 32°5| 502 128 125 | 1153 306 288 | 96 | 55 
28 504 127 127 | 113 367 280 | — | 
| | 
Skulls of Tasmanian-. 
27 510 126 132 113 371 270 i 70 
30 520 132 128 113 373 295 = a 
_ 526 126 134 119 379 
33 553 135 134 130 399 300 =e 74 
29 525 129 137 120 386 295 121 79 
31 545 140 122 130 392 304 115 68 
32 125 122 272 62 
30°5| 510 126 129 115 370 285 114 70 


} Berry & Robertson ac 


& 
1 99 | 127 | | 
= 
126 | 114 
96 130 119 
%6 3 114 
| 
96 132 114 
93 | 123 | 109 
102 132 117 
95 135 112 
95 137 116 | 
95 | 132 | 113 
89 126 112 
Lats ON Se of the contractions used in columns 3 and 4. Fe 
t 


APPENDIX Til | 
vents of adult skulls of Tasmanian full-bloods and Tasmanian-Australian mixed-bloods* 


Skulls of Tasmanian full-bloods 
| Orb.- Inter- Alv. ,, | Min. | Max. 
aly. | | orb. | % | arch. | | | thick. |thick-| C:24 
ht.: 20 Bzt5 7 7 13 *18 7196 | 190 ness | ness 
~— 68 41 129 22 38 37 33 33 48 27 68 62 38 51 6 7. «| -a270 
a2) 39 | 3 25 | 25] | — 7 8 = 
57 39 23 37 36 27 27 42 26 46 6 Il 1250 
— 40 — 22 38 38 28 28 44 28 — 
— = 38 — 21°5| 39 39 29 29 2 28 | — _ 44 — 3 6 1150 | 
36 38 38 30 30 40 27, ; 645} — | 405 — 
— 67 36 — 2 38 38 34 33 50 30 66 61 | 40 — — —_ —_— 
= = 139 23 38 39 30 20 44 30 3 9 | 1336 
_ — — — — _ _ — — — 5 6 
= 131 22 36°5| 30°5| 30°5| 26°5 | 67°95; — | 385] 52 4 7 1362 
— 64 40 — 22 — 36 28 41 25°5| 62 63 | 34 53 4 7 a 
113 62 qo | 124 21 37 37°5| 30 20°55 44 24 66°5| 62 43 51 4 7 1106 
112 61 44 — 22 — 39 — 27 47 26 70 62 40 52 4 7 — 
56°5| 33 21 37 37 30°5| 42 25 62 54 37 44 3°5| 65] 1140 
62 39 23 37 37 28:5] 29 46 25 62 62 37 7 | 1320 
-- 61 39 126 22 39 37 29 30 45 27 65 60 39 49 | 4 7 1316 
_ 73 48 137 25 40 38 31 29 | 50 29 71 64 45 52 — _ 1498 
36 — 22 38 33 49 | 27 = Bes 3 
99 61 34 123 23 36 36 30 30 44 26 58 56 31 49 3.1 Qe 1098 
123 20 36 36 31 30 44 25 4 7 1080 
— 125 21 38 39 3] 5 7 II50 
22 | 38 32 |. 27 | 66 3 6 
os 62 37 121 22°5| 37 37 2 29 2 24 | 64 | 59 36 Slt oy 6 1296 | 
244) 3 | 35 | 27 | 27 | 45 | 27 | — | — |] — |] = 8 1130 
| 33 20°5| 37 37 29 | 31 42 28 | 61 36 3 6 
ear 62 | 36 Pars 2 35 35 29 29 44 25 66 54 39 47 ee “a aE 
zor | Gr | 38 | 129 | 23 | 37 | 37 | 3t5| 385] 445] 25 | 63 | Go | 96 | 52 |. a 7 | 1252 
_— 69 2 135 | 20 38 39 35 34 50 26 — 58 — 48 5 8 1322 
46 26 33 32 28-5} 29 37°5| 24°5| 59 51 33 | — 4 6 
Soe. 59 37 a | 21 35 35°5| 30 31 44°5| 26 62°5| 57 36:5} 475] 4 7 =. 
} — 58 | 37 | — | 215] 385] 38 31 30 | 43 25°5| 66 57 | 385] 465] 3 7 — 
— 66 | 45 | 129 | 22:5] 37 39 30 30 47 26 68 59 4o 49 4 6 1220 
| 96 S51 ar 124 | 18 37 38 28:5} 29 42 24 63 53 40 45 _— _ 1362 
| 35 39 | 39 | 34 | 34 | — | 225] 63 | — | 40 | 49 4 8 | 1160 


lis of Tasmanian-Australian mixed-bloods found in the domain of the west coast tribe 


= 70 45 a 23 38 36 31 31 49 27 71 63 42 56 3 9 1148 
42 139 37 37 30°5| 51 30 67 39 49 4 7 
~ 74 46 141 23°5| 42 43 31 33 51 27°5| 72 66 41 59 4 8 1383 = 
121 79 47 134 23°5| 40 42 31 30 57 22°5| 69 61 41 54 3 7 1323 112 
II5 68 45 130 24 38 38 32 32 52 28 70 61 44 52 5 9 1170 119 
yee 62 37 aa 23°5| 38 37 31 31 48 27 65 58 39 49 3 6 — _ 
114 7° 43 130 22 39 40 34 34 51 29 68 63 41 54 4 7 1174 = 


ry & Robertson accept Nos. 29, 30 and 31 as Tasmanian. } Wood Jones & Campbell accept Nos. 65, 66 and 67 as Tasmanian, 


| 
— 
127 
— 
122 
— 
— 
=e 
. 
Ls 
109 
II 
123 | 
1o2 
| 
| | 
! | 
| 


APPENDIX Ill 
of Tasmanian full-bloods and Tasmanian-Australian mixed-bloods* 


Skulls of Tasmanian full-bloods 


Inter- | Alv. | | Min. | Max. BL | Re | 
J:8 | orb. | B: g | arch | thick-|thick-| C:24 | w,:25 | gon. | mus 8a .| 
Bz15 17 7 13 * 799 | 199 | neas | ness B:26 | L:27; 7° 
129 22 38 37 33 33 48 27 68 62 38 51 6 7 1270 —_ — 62 31 36 31 
23 37 3 27 27 42 46 6 Il 1250 
— | 22 3 3 2 2 44 — — 
98 39 30 29 | 49 27 | 66 | 60 | 38 1378 = 
135 | 225) 39 | 39 | 30 | 29 | 48 | 28 | 73 | — | 4 | — | 5 | 
21 38 38 30 | 30 | | 27 | | | | 
139 23 38 4 30 2 44 | = 3 9 | 1330 | 127 102 57 31 43 
131 25 7 38 31 31 43 28 | 66 ae | | | 97 4° 29 
131 | 22 | 365) 305) 30°5| 48:5) 26:5) — | 385) 52 4 7 | — | — | 
22 | 36 2 41 25°5 62 63 34 | 53 | 4 | 
124 21 a7 37°5| 30 29°5| 44 | 2 66°5| 62 49 | 305| 365) 32 
22 39 27 | 7o | 62 | 40 52 4 | 86 57 | 30 | 37:5] 28 
119 21 37 37 30°5| 42 25 | 62 54 | 37 | 44 35 65| | — | — | — = 
a3 | 37 | 37 | 29 | 4 | 25 | G2 | G2 | 37 | 5 7 | 1320 | | 
126 22 39 37 29 30 45 | 27 | 65 | 60 39 | 49 4 7 | rs | — | — ot fe ss ae 
137 | 25 | 40 | 38 | ar | 29 | so | 20 | 7t | 64 | 45 | 52 | — | — | 998 | — | — | —| —] —] = 
123 23 36 36 | 30 30 44 | 26 58 | 109 | or | 46 | 32 | 38 31 
| } | | | | 
| — | i 2 — 8 — | | — 
24 37 | 32 | 40 25 4 38 = see | 
121 22°5 37 37 29 29 | 2 | 24 | 64 59 36 | #51 3 | 1296 | — 
20°5| 37 37 29 31 42 | 28 3 6 | 
131 26 37 37 32 31 | 48 28 | | — 1428 | 
Ber 
| 23 | 97 | 97 | 31°5 44°5| 25 | 63 = 4) 7 | | | 45 | — 
| 35 | 355) 30 | 3% | 20 | 625) 57 | 365| 475] 4 | 7 | | | — 
— | 215] 385] 38 | 31 | 30 | 43 | 25:5) 66 | 57 | 385! 3 | 7 | — 
529 | 22:5] 37 | 39 | 30 30 | 47 | 26 | 68 59 |) 6 | 1220 — — 
124 | 18 37 38 | 28:5] 29 | 42 | 24 | 63 53 | 40 | 4g} ee oe 1362 23 100 | 56 | 30 | 42 31 
| 23 | 39 | 39 | | | 22°5| | 2 | 94 | 53 | 33 | at | 29 
an mixed-bloods found in the domain of the west coast tribe 
| 
23 | 31 49 | 27 71 63 | 42 56 3 9 1148 = 
1399 | — | 37 | 37 | 30°5| 30°5| 51 4 
141 23°5| 42 | 43 | 31 33 | 27°5| 72 | 66 | 41 59 4 8 1383 | | 
134 23°5| 40 42 31 30 57 22°5| 69 61 | 41 54 3 7 1323 112 95 61 33 41 31 
130 24 38 38 32 32 ¥ 28 70 . | 44 52 5 9 1170 119 86 66 36 46 33 
23'°5| 3 37 31 31 4 27 5 | 39 49 3 = = 
130 ae 39 4° 34 34 51 29 68 63 | 41 54 4 7 1174 <r] +e 52 32 42 32 


DS. 2G, 30 and 31 as Tasmanian. 


} Wood Jones & Campbell accept Nos. 65, 66 and 67 as Tasmanian. According to Hrdlitka these three 


awe 
; 


j 
Max. Ht. of | Phick- | | | 

B hy:29 body: |2ess of} Mand. 100 100 100 100 100 | 100 
“| body: | Z :32 B/L H’/L B/H’ B’/B | GH/J |NB/NH| 3 alt | | fmb/fmi 

2 31 

36 31 25 | 107-9 69°1 52°7 563 | 86°8 89-2 74°5 82-9 
= 73°71 69°5 | 05°99 | 641 65°8 =- 
79°3 69°0 115-0 65°8 619 | 730 | 75° 81-3 
= 778 730 106-7 660 | — 551 | 789 | 7474 8r-1 | 
| 742 | 676 | | 7o4 | — 66-7 | 74°4 744 87-1 
| 751 720 1044 | 68:3 | 489 | 58:3 | 769 74°4 76°9 
| 735 | 718 | 102-3 | 66-2 587 | 79 | 79 | — 74°7 
- — 72°5 704 103°0 jos | — 600 | 895 | 88 | — 81-6 
I 43 =. 4.55 73°4 100-7 68-2 | 78-9 44 = 78-9 
I 40 29 26 14 132° 75°3 731 102-9 707 | — | 83:8 81-6 87°5 
= = = 77°3 == 66-9 =< = 
| 72°7 70°6 | 69-9 546 | 836 | 83-6 740 | 94-2 
— — 72:1 67°9 106-2 679 | — | — 78-6 
| 32 15 | 130° | 77% 720 | 107-1 63-7 | 500 | 545 | 78-7 | 843 | 806 
D. | 37°5| 28 24 115° 73°5 65°4 112-4 55°3 69-2 76-9 79°5 
772 73°7 | 673 | 47°5 | 59°5 | 78 82-4 | 841 | 80-0 
— 74°3 710 104°6 70-6 — 77:0 78-4 740 | 83-3 
— | — | — 747 75°8 98-6 69°5 48-4 60-0 74°4 | 82-9 
- | — — — 113°8 66-4 85-3 
| = =} = — 741 70°6 53°3 58-0 77°5 76°3 86-5 82-9 
2 | 38 | 33 24 15 | 130° | 77° 77°6 992 | 672 | 496 | Sor | 83 83:3 | 63-3 | 81-3 
| 68-2 116-9 64:2 56°8 86-1 83-3 87-9 
42 75°4 714 105°6 73°5 79°5 73°5 
37 24 | I2 123° 75°0 71-6 104°8 68-2 88-2 
777 | 687 | 1130 | 669 | 51-2 | 574° | 784 | 784 | 706 | 
— 79°8 74:0 107'8 77°6 -- 60-0 75°0 7771 
- | — = 721 70°8 58-3 86-5 89-2 

II-5| 129° | 77°6 108-4 65°8 47°3 56-2 85:1 69:2 93°9 
— | — 73°7 58°4 85°7 87°3 76°8 84-7 
76:8 72°90 105*4 68-4 59°3 80-5 78-9 82:8 78-4 
| — | 985 | | | 55:3 | 769 | 816 | 83-3 
o | | 3 27 il 122° 77° 74°2 103°8 67:2 44°4 571 76°3 88-9 87-8 
3 | 41 | 29 23 17 110 75'8 108-0 67°4 87-2 87-2 81-6 

T 

—j— 70°4 67:2 104:8 55°1 81-6 86-1 75°0 750 
72'5 99°3 69°9 58-8 82-4 82-4 79°8 81-1 
— | 69° | 66-2 105:2 66.9 | 53°9 73°8 767 | -69°5 | 82-5 
41 31°5| 28 15 | 115° | 74:9 69°1 108-3 66-4 59°0 39°5 714 759 85°3 
6 46 33 30 16 68-4 105'2 81-6 52°3 53°8 84:2 84:2 84:6 81-6 
32 42 32 27 15 121° 74:2 69:2 53°8 87-2 85-0 75°9 87-1 


itka these three skulls are of Australian type. § Hrdlitka accepts No. tog as Tasmanian. 
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J. WUNDERLY 


Measurements of tibiae 


337 


No. of measurement 
Serial} Museum 
letter No. 
1 2 3 4 5 6 7 8 9 10 ll 
A |A(E.H.)794| 366 357 351 68 31 24 25 36 41 33 81 . 
B 797 393 389 377 _— 35 22 25 42 45 34 85 
C 793 — 40 24 — 55 90 
D 795 388 380 369 78 33 20 23 39 46 32 81 “4 
E 798 355 352 342 70 31 19 24 37 | 43 32 71 ae 
F 796 — — 35 23 85 
No. 4 Se. 6 6 4 4 6 
Mean 375-5 | 369-5 | 359-7 | 72-0 | 34-2 | 22-0 | 24-2 | 38-5 | 46-0 | 32-7 | 82-2 | 
| 
| 

[ 
Serial Length Thickness Cnemic : 
letter index (11/1) index (6/5) index (7/8) , 

A 22-1 17-4 69-4 

B 21-6 62-9 59-5 

Cc 60-0 

D 20-9 60-6 59-0 

E 20-0 61-3 64-9 
F — 65-7 
No. 4 6 4 a 
Mean 21-2 64-6 63-2 i 


Five of the nine unsexed femora are hyperplatymeric (74-9 and under) and the other 


four are platymeric. Two of the tibiae are platyenemic (55-0—62-9) and the other two meso- 

enemic. Turner (1910) states that the stature of the Tasmanians, “‘as determined by mea- 

surements made during life’’, ranged ‘“‘in men from 5 ft. 1 in. to 5 ft. 6 or 7 in., with a mean ! 
of 5 ft. 3} in., and in women from 4 ft. 3 in. to 5 ft. 4 in., with the mean 4 ft. 11} in.” 
Using the formula to which he refers—stature = 2 (oblique length of femur +condylo- 
astragalar length of tibia) + 26 mm.—the mean measurements of the unsexed Eaglehawk 
Neck femora and tibiae give a mean stature of 5 ft. 5 in. 


| 

| 

| 

| 

| 

| 

ra 


NOTE ON Dr J. WUONDERLY’S SURVEY OF 
TASMANIAN CRANIA 


By G. M. MORANT 


THERE can be little doubt that some of the alleged Tasmanian skulls in Commonwealth 
collections for which data have been published are not of pure Tasmanian origin, and 
anthropologists are indebted to Dr Wunderly for having made a comprehensive and careful 
enquiry into the authenticity of each specimen. He explains that a number should be 
rejected, partly because there are no adequate records to authenticate them, but prin- 
cipally on account of the fact that their characters distinguish them from the genuine 
crania. One may accept his diagnosis as correct in the majority of cases, at least, and yet 
remember the danger that anatomical selection of a racial group may lead to a sample 
with unnaturally small variability. An examination of any random series of skulls which 
correctly represents a specialized racial population—such as the Guanche, the Andamanese 
or the Greenland Eskimo—shows that a number of the individuals included may depart 
quite markedly from the type for the series. 

Dr Wunderly’s measurements of ‘“‘full-blood Tasmanian” crania may be compared 
with those given previously for a sample which almost certainly includes some spurious 
specimens. The series made up by the present writer (1927) by pooling data given by a 
number of anthropologists may be used for this purpose. It includes more than half of 
Dr Wunderly’s accepted sample, a few specimens rejected by him, and a number in Euro- 
pean collections—some of which may be unauthentic—which he has not measured. In 
comparing constants for these two groups it must be remembered that personal equation 
in measuring and changes in the estimates of sex may be partly responsible for the differ- 
ences observed, while sampling errors are large as both series are small. Means are given in 
Table I on p. 321 above. The differences between corresponding pairs are nearly all of the 
order expected for samples of the sizes available drawn from the same population, and the 
agreement in the case of the indices is particularly close. A bad agreement is only found in 
the case of the interorbital breadth, and it is extremely probable that this is due to the fact 
that different definitions of the measurement were used. If it is ignored, male and female 
comparisons can be made for twenty-two absolute measurements. For these Wunderly’s 
female mean exceeds Morant’s in eighteen cases, but the same tendency is not observed in 
the case of the male series. It can be seen from Dr Wunderly’s table of individual measure- 
ments that in his “‘full-blood Tasmanian”’ series the fifteen male skulls previously described 
had all been classed as male by the earlier anthropologists, but of his nineteen female nine 
had previously been classed as male. The transference of an appreciable proportion of 
specimens from the male to the female group will be expected to lower the male and raise 
the female means. Only one of these effects is observed, but in view of the close agreement 
of the indices it appears probable that the resexing of the material has had more effect on 
the means of absolute measurements than the rejection of doubtful specimens. 

A few standard deviations (with their probable errors) are given in the table below. It 
is customary to find that these constants tend to be slightly less for a female than for a 
male series representing the same population in the case of absolute measurements, and 
to be approximately the same for the two sexes in the case of indices. Wunderly’s male and 
female series are quite unexceptional in this respect, but when compared with Morant’s 


their standard deviations are seen to be appreciably less in the case of the maximum 
calvarial breadth and cephalic index. 
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G. M. Morant 339 
Male Female 
Tasman Pooled Tasman 
series “Tasmanian” series 
(Wunderly)* series (Morant) (Wunderly)* 
Maximum length (Z) 7-58 + 0-66 (30) 6-01 + 0-44 (43) 5-94+0-57 (25) 
Glabella-inion length 8-12+0-71 (30) — 5-77 + 0-55 (25) 
Maximum breadth (B) 4-11+0-38 (27) 5-32 + 0-33 (60) 4-62 + 0-44 (25) 
Maximum frontal breadth (B’’) 4-36 + 0-42 (25) — 4-36 + 0-42 (25) 
Bimastoid 5-42+0-51 (26) 
Minimum frontal breadth (8’) 4-35 +0-41 (25) 4-81+0-29 (62) 4-15+0-38 (27) 


Basio-bregmatic height (H’) 
Auriculo-bregmatic height 
Horizontal circumference (U) 
Are nasion to bregma (S,) 
Are bregma to lambda (S,) 
Broca’s transverse arc 
Interorbital breadth 

100 B/L 

100 A’/L 

100 B’/B 


4-76+0-31 (55) 


| 

| 14-10 +0-97 (48) 
| 5-9840-43 (44) 
| 677+40-50 (42) 
| 10-92 + 0-82 (40) 


2-58 + 0-19 (43) 
2-21+40-17 (37) 


| 


5-07 + 0-47 (26) 
5-59 + 0-54 (24) 


1-66 + 0-16 (24) 
3-00 +.0-29 (25) 


* Constants provided by Dr Wunderly. 


The following distributions are for the latter character. The samples are too small to 
yield any decisive conclusions, but there is certainly a suggestion that the female dis- 
tribution for Dr Wunderly’s measurements has been curtailed. The standard deviation 
for it is appreciably lower than that recorded for an unselected series of skulls from any 


part of the world. 


| | 

(central values) 68-5 | 69-5 | 70-5 | 71-5 | 72-5 | 73-5 | 74:5 | 75-5 | 76-5 | 77-5 | 78-5 | 79-5 | 80-5 |Totals 
| ¢ Wunderly| — | — 1 2 5 8 3 2 1 + 0 1 — | 27 
| Morant 1 2 35| 05] 4 8 95| 75) 0 15| 4 15) — | 43 
Q Wunderly| — | — | — |] — |] — 3 15) 4 6 65] 2 1 — | 2 

Morant 1 0 0 3 05] 15] 3 2 2 3 2 0 | 19 | 

| 


In my paper giving the means of the pooled series of Tasmanian skulls the coefficient of 
racial likeness with an Australian (the A) series is provided and it may be asked how such 
a comparison is affected by the revision of the material. Of the thirty-one characters used 
when possible for this purpose, there are twenty-one available for all three male series and 
these give the following coefficients. The Australian A standard deviations were used in 
these comparisons, though these are probably rather greater than the true values for the 
Tasmanian population, and hence all the coefficients are probably somewhat lower than 
they should be. 

Crude 0.R.L. Reduced 


Tasmanian: Wunderly (% = 21-7) and Tasmanian: Morant (48-3) 0-66+0-21 2-21 +0-69 
Tasmanian: Wunderly and Australian A (113-2) 11-3440-21  31-14+0-57 
Tasmanian: Morant and Australian A 13-72+0-21 20-27+0-30 


| 5-11+0-50 (24) = 
3-72 + 0-35 (25) 
16-1641-5 (25) 
6-75 + 0-62 (27) | 
| | §-71+0-51 (28) 
| 11-0641-1 (25) 
| 2-56 +0-24 (25) 
2-14+0-20 (27) | 
2-39 + 0-23 (24) 
2-71 + 0-26 (24) 
| 
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Note on J. Wunderly’s Survey of Tasmanian Crania 


The first of these comparisons is not really justifiable, since a certain number of speci- 
mens is common to the two Tasmanian series, but it shows that they have very similar 
mean measurements. At the same time, judging bv the reduced coefficient, the selected 
series is distinctly further removed than the other from the Australian A series. The same 
situation is observed for all the characters considered singly which show significant 
differences between the Australian mean, on the one hand, and both Tasmanian means, 
on the other, with the single exception of the cephalic index. These means are: 


| 


100 B/L 100 B/H’ 100 NB/NH | B 
Tasmanian (Wunderly) 74:2 (27) | 106-2 (22) 59-9 (19) 138-2 (27) 
Tasmanian (Morant) 74-2 (43) | 103-9 (35) 59-1 (57) | 136-0 (60) 
Australian A 70-8 (94) | 99-3 (156) 54-6 (132) | 132-2 (162) 
a | 
LB GH 0, NH 
Tasmanian (Wunderly) 98-1 (22) 62-4 (12) 29-9 (19) 45-1 (21) 
Tasmanian (Morant 98-8 (55) 62-5 (36) 31-1 (60) | 47-1 (58) 
Australian A | 102-1 (137) 66-8 (79) 33-5 (118) 49-5 (118) 


Dr Wunderly’s selection of the skulls has thus had the effect of modifying our conception 


of the Tasmanian type and making it still less like the Australian. The revision doubtless 
gives a closer approximation to the truth. 
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Wunderly : Cranial and other Skeletal Remains of Tasmanians 


A typical female Tasmanian skull: Tasman series No. 38. 
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Biometrika, Vol. XXX, Parts III and IV Plate II 
Wunderly : Cranial and other Skeletal Remains of Tasmanians 
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Typical male Tasmanian skulls: A, Tasman series No. 35; B, No. 36. 
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Biometrika, Vol. XXX, Parts III and IV Plate III 


Wunderly: Cranial and other Skeletal Remains of Tasmanians 


B 


Typical female Tasmanian skulls: A, Tasman series No. 37; B, No. 34. 
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Wundeily : Cranial and other Skeletal Remains of Tasmanians 


TASMANIAN 


34 


Typical Tasmanian skulls: A, Tasman series No. 34, female; B, No. 35, male: 
C, No. 36, male; D, No. 37, female. 
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Wunderly: Cranial and other Skeletal Remains of Tasmanians 


Typical Tasmanian skulls: A, Tasman series No. 34, female; B, No. 35, male; 
C, No. 36, male; D, No. 37, female. 
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DISEASE AND ENVIRONMENT 


By E. A. CHEESEMAN, W. J. MARTIN ann W. T. RUSSELL 
Of the Medical Research Council’s Statistical Staff 


From the Division of Epidemiology and Vital Statistics, 
London School of Hygiene and Tropical Medicine 


A CHARACTFaIsTICc feature of vital statistics is the greater mortality in town 
than in country. Why should this be? The most obvious difference is the closer 
contact between human beings. It may indeed be true that in some villages 
domestic overcrowding is as great as in towns, but the number of persons per 
acre is greatly less and the occupations of town-dwellers, whether in offices, 
shops or factories, involve a longer continued and more intimate contact of 
human being with human being than is the rule in country life. 

The conditions of town life (even if we do not reckon the intense, temporary 
overcrowding, consequent upon transport from suburbs to centre, a very im- 
portant factor of modern urban life) are evidently favourable to the droplet 
infection which, in so many diseases, is believed to be a principal means of 
transmission. Hence the relation between density of population and the in- 
cidence of disease and death must always be worthy of close study. 

It is not a region hitherto unexplored but, in most of the early investigations, 
the effect of density has been measured solely in terms of mortality. This index 
is not quite satisfactory because before 1911 no transference of death was made 
to place of residence, and hence there must have been many instances in which 
there was an appreciable difference between the ostensible and real mortality 
of districts containing hospitals and institutions. In the present paper the work 
of earlier investigators has been reviewed and an attempt made by the adoption 
of more recent statistics to assess the relationship, not only between density 
and mortality in general, but also between density and the morbidity from 
infectious disease. 

PREVIOUS INVESTIGATIONS 
Farr 

The problem of density and health was first examined by Farr. In the 

appendix to his Fifth Annual Report (Farr, 1843) he endeavoured to show that 


there was a definite relationship between density and mortality which was 
described by an equation of the form 


D=C8", 


where D is the crude death rate and 6 the density (number of persons per square 
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mile). He found in his examination of the statistics for the districts of London 
as then constituted that “‘the mortality did not increase as their density but 
as the 6th root of their density”. Farr (1875, pp. xxiii-xxiv) returned to the 
subject with an analysis of the data for 1861-70 and, in the Decennial Supple- 
ment for that period, gave the following account: 


A larger basis is now supplied by the facts of the ten years recorded in all the districts 
of England and Wales. They have been arranged in the Tables; and with this result, that 
in every group the mortality increases with the density, but happily not in direct proportion 
of the density. London has been excluded in the following calculations. Thus in 345 
districts with a mortality of 19-2 the density was 186 persons to a square mile; in 9 districts 
with a density of 4499 what was the mortality? In the first place it was not expressed by 
the proportion of 186:4499::19-2:x% but by this proportion: 


28-1. 


The observed and calculated rates as deduced by Farr for varying densities 
were: 


: Crude death rate 
Density 
Group of rsons per = 
districts 
q Observed Calculated 
I 166 16-75 18-90 
Il 186 19-16 19-16 
379 21-88 20-87 
| IV 1718 24-90 25-02 
| V | 4499 28-08 28-08 
VI 12357 32-49 32-70 
| VIL 65823 38-62 38-74 


He inserted the following footnote to this table (Farr, 1875, p. clviii): 


m being the mortality in any group and m’ being the higher mortality at any other 
group, D and D’ being the density of population in the two groups then 


D’\" *11998 


The mortality of the districts is nearly as the 0-12th root of their densities or taking the 
above value of n, and p and p’ as the mean proximity of person to person we have 


4 2n 


So the mortality of the district is nearly as the 6th root of the proximities, 


This statement is not arithmetically correct, as according to the value of n 
obtained by Farr the mortality varies as the 0-12th power and not as the 0-12th 
root of the density. 

Ogle and Tatham 


Neither of Farr’s immediate successors, Ogle and Tatham, shared his belief 
that mortality was purely a function of density of population. Tatham (1895, 
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p. xlvi), discussing the density and mortality figures for 1881-90 in relation to 
Farr’s “‘law”, wrote: 


although density and mortality generally increase or decrease together the relation between 
them is now too complex to admit of being expressed by a formula similar to that alluded 
to above. 


Brownlee 


For Brownlee (1922), Farr’s equation held a considerable fascination, and 
the following quotations are a striking testimony of his belief in the soundness 
of the conception and its application to vital statistics. 


His (Farr’s) treatment of it is one of the brilliant attempts to extract the real meaning 
of figures so frequent in his work, but though this theory has not shared in the complete 
neglect that has been the lot of his attempt to put a quantitative measure to the course 


of epidemics it has suffered as much from the kind of patronage with which it is usually 
discussed. 


He revived interest in the law and demonstrated its applicability to domains 
other than public health. In its relation to density and disease he stressed the 
fact that owing to wide variability in the mortality of districts possessing the 


same density of population the law can really only be a law of average. In his 
view 


the effect of density is not merely as density. The country preserves life even in the presence 
of excess or dissipation: the town does not. Further, in the period of growth, children in 
the city do not get anything like the same chance as their fellows in the country, even 
though housing may be better and food more abundant. In addition, filth in the country is, 
at its worst, in most cases but a local nuisance spreading enteric and diarrhoea at times, but 
not having the power of rendering a whole district foetid. All these influences act con- 
currently and cumulatively to depress health the more closely people are crowded together, 
and as life is a physico-chemical process this effect must be measurable and should be 


capable of expression in some formula which goes back to chemistry and physics. Such a 
formula is that of Dr Farr. 


The necessity of applying an equation of this nature to describe the relation- 


ship between density and mortality statistics was demonstrated by Brownlee 
in the statistics for Glasgow for the years 1898-1902. 


Scarlet fever Enteric fever 
| Group Population | Room 
districts 1901 | density 
Cases | C.F. | Cases | GF. 
| 
| 34868 0-5 -1-0 106 | 
it 83255 1-0 -1:5 | ems | 38 339 | «(128 | 
It 201098 1-5 -2-0 | 5439 1308 | 158 | 
IV 87885 20-225 | 218 | 50 a= 
237161 2-25-25 | 5610 51 1743 | | 
VI 117445 2-5 2091 5-6 1003 163 | 


The significance of the by the trend “of these case fatality 
rates is fairly obvious. Taking first the statistics for enteric, it is seen that, had 
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an investigation on the influence of insanitary conditions been confined to those 
localities in which the density value was over 1-5 persons per room, the in- 
evitable conclusion would be that the environment of the person had no influence 
on the severity of the disease, as the fatality rates are nearly constant. Similarly 
for.scarlet fever, but in this instance the upper limit of fatality is not reached 
till the concentration of the population is that of two persons per room. 
Having been fully convinced that the relationship of density was best 
described by Farr’s law, Brownlee then proceeded to apply the formula to the 
mortality data in the Registration Districts of England and Wales, grouped 
according to their densities for each decennial period since 1861—70. It will be 
remembered that Farr did similar calculations for the first decennium, but for 
his index of mortality he used the crude death rate. Brownlee would not accept 
either the crude or standardized rate as a suitable measure of ill-health. In his 
opinion the standardized death rate represented an impossible mortality in a 
stationary population—a standardized death rate of 13-49 per 1000 in the 
healthy districts would, in a stationary population, yield a mean life of 


1000, 
13-49 or 74 years, 

a figure, he said, hardly conceivable if the observed properties of life represent 
anything which is fundamental. But it is difficult to understand the reason why 
either the crude or standardized death rate should be dependent on what they 
represent in a stationary population. The crude death rate is a reality—the 
population has actually died at that rate—whereas the life table death rate, his 
preference, is governed by hypothetical considerations. 

It may seem extraneous to our investigation to dwell at length on the 
appropriate measure of mortality which should be inserted in the formula, but 
to appreciate Brownlee’s work it is necessary to do so. He definitely maintained 
that the life table death rate was the only satisfactory criterion of ill-health: 
it has one property which places it as a measure above either the standardized death rate 


or the crude death rate in as much as it has been found for England and Wales to be very 
closely connected with the density of population. 


It must be pointed out that the life table death rates which he calculated for 
the various groups of districts were deducible from the standardized rates by 
means of linear equations. The final equations of the densities and the life table 
death rates in the Registration Districts of England and Wales as obtained by 
him for the successive decennial periods were: 

1861—70, D = 12-42g01001, 
1881-90, = 11-450-0985, 
1891-1900, D = 10-836010°, 
1901-10, D= 9-90801023, 
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From these equations he concluded that: 


though general health has improved the power of the density has stood unchanged for 
forty years. That is to say that the death rate and the density remained related in essen- 
tially the same way in the counties of England and Wales in 1905 as in 1865. It is the 
censtant multiplier that has been affected by hygienic measures and not the law of the 
power. Hygiene acts surely all round but still is subjected to fundamental laws. 


His adoption of the life table death rates in preference to either of the other 
measures invcked strong criticism. 

In a discussion which took place at the Royal Statistical Society (1922) on 
“The Value of Life Tables in Statistical Research” the majority decried their 
use for this purpose and also opposed the theory that because it yielded a con- 
stant index in Farr’s iaw the life table death rate possessed an intrinsic value. 
In the discussion the point was made that, since the life table death rate was 
deduced from the standardized death rate, by means of a linear relation, the 
relatively smaller variability of the life table death rate might be a mere arte- 
fact. This, however, does not impugn the conclusion that mortality, whatever 
the standard of measurement, does vary with density and that the relation is 
curvilinear. The death rates for 1920-2 and 1930-2 for London shown in the 
following diagram illustrate the point. The death rates for 1920-2 steadily 
increase over the range of densities, but the mortality during 1930-2 is at a 
maximum when the density is 1-37 persons per room and then declines slightly. 
This decline may be due to a random fluctuation arising from the smallness of 
the population living at the highest density group in 1930-2 as compared with 
1920-2. Or possibly a saturation point is reached in 1930-2 at about 1-22 
persons per room when, with increasing density, there is no pro rata increase in 
mortality. The last hypothesis could be reconciled with the results for 1920-2, 
since the saturation point will vary with the general level of the mortality rate. 
It would be higher in 1920-2 than in 1930-2 owing to the higher mortality rate 
in the former period, i.e. the saturation point in 1920-2 apparently occurs 
outside our range of densities. The results for 1930-2 give confirmation to an 
observation previously noted by Brownlee—that it is futile to conduct en- 
quiries on the effects of environment on health and disease over a restricted 
range of environmental conditions. If such an enquiry had been conducted 
in London in 1930-2 and confined to districts having densities of 1-22 persons 
per room and over, then it is obvious that from a mortality viewpoint little or 
no differentiation would have arisen. 

It seemed of interest to compare the relationship between density and 
mortality as described by a simple curve and a straight line respectively. For 
this purpose the standardized death rate was related to the two modern measure- 
ments of density, i.e. the number of persons per room and the percentage of the 
population living more than two to a room, and the crude death rate to the older 
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London 1920-2 


| 
0 
0-77 0-92 1-07 1:22 


London 1930-2 
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index, the number of persons per acre. The statistics which were utilized to 
effect the comparison were the death rates for the London boroughs for 1930-2, 
grouped into six classes according to density. The results are: 


London, 1930-2 


Mean no. 
of persons Standardized 10-191736° | 2.91818 +7-2388 
per room 
0-77 9-34 9-38 9-49 
0-92 9-87 9-93 9-92 
1-07 10-34 10-41 10-36 
1-22 11-14 10-85 10-80 
1-37 11-56 11-26 11-24 
1-52 11-23 11-63 11-67 
Root mean square error 0-24 0-27 
Mean % living : 
more than two | Standardized | 9 | 0-113378-+8-6626 | 
to a room ee 
5-5 8-87 8-93 9-29 
10-5 10-01 9-96 9-85 
15:5 10-63 10-64 10-42 
20-5 11-31 11-16 10-99 
25:5 11-73 11-58 11-55 
30:5 | 11-67 11-94 12-12 
Root mean square error 0-14 0-31 | 
Mean no. 
of persons Crade 7-694353"10 | 0.016703 + 10-6490 | 
death rates 
per acre 
22-5 10-52 10-66 11-02 
47-5 11-64 11-53 11-44 
72-5 12-26 12-05 11-86 
97-5 12-50 12-43 12-28 
122-5 12-78 12-74 12-69 
147-5 12-71 12-99 13-11 
Root mean square error 0-16 | 0-33 | 


Although the advantage lies in each case with Farr’s expression, the advan- 
tage is not significant. Applying the z test to the corresponding estimates of 
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variance and combining the results, the improbability of such a concordance 
as observed is only moderate (P=0-078). It must indeed be remembered 
that the method of fitting of Farr’s equation (least squares applied to the 
logarithms) is not efficient, so that the superiority of the fit may be under- 
estimated. 

Any discussion on the effects of density on mortality would be very in- 
complete without mention of the painstaking studies by Stocks which are 
described in the Text or Part III of the Annual Reporis of the Registrar- 
General, particularly for the year 1932. He investigated the influence not only 
of density but also of latitude on the general mortality and on specific diseases 
at all ages and at particular age periods. His main conclusion was: 

It seems fair to conclude that it is at these ages (1—5) that the greatest benefits may be 
anticipated as the overcrowding evil is mitigated. 

It will be noted that in all these investigations, apart from that by Stocks, 
the effects of environment on mortality have been represented by the total death 
rate. A priori this criterion would seem to be too comprehensive. It is extremely 
unlikely that the child and the adult react in an equal degree to their environ- 
mental conditions, and hence the more appropriate examination would be one 
within specific age periods. 


CHILDHOOD 


There is no evidence that the unborn child is influenced by the mother’s 
environment. The foetus has even been described as a true parasite protected 
against the vicissitudes of the mother and more or less independent even of her 
starvation or dissipation. But once the child begins its separate existence apart 
from its mother its immediate reaction to its surroundings will be best represented 
by a particular quota of the infant mortality. The reason for such discrimination 
is this. The death rate in the first year of life has been regarded as depending on 
three main factors: (1) shock of birth, (2) instability of the nervous and digestive 
system, (3) external factors embracing infection and environment. 

If we accept the aggregated county boroughs and the aggregated rural 
districts as representing two widely divergent environments and then group their 
infantile deaths in age periods into two categories, A and B, in which A includes 
deaths of (1) and (2) character and B those in (3), thus: 


A B 
Premature birth Measles 
Congenital malformation Whooping cough 
Congenital debility Diarrhoea and enteritis 
Injury at birth Tuberculous diseases 


Convulsions Bronchitis and pneumonia 


ot te 
ott 

> 


EK. A. CHEESEMAN AND OTHERS 349 
we obtain the following death rates and ratios for the decennium 1921-30: 
Age period in months 
0-1 | 1-3 3-6 6-12 
Group A: 

County Boroughs 26-94 4-98 2-07 1-42 
Rural Districts 25-92 | 4-15 1-66 1-33 
Ratio C.B./R.D. 1-04 1-20 1-25 1-07 

Group B: 
County Boroughs 3-06 7-21 | 9-18 17-06 
Rural Districts | 196 | 4652 4-83 8-70 
Ratio C.B./R.D. | 166 | 160 | 100 1-96 

| i 


For the diseases in Group A, which may be regarded as non-preventable 
(deaths from prematurity accounting for the greater proportion), there is little 
difference between the rates in town and country in the first month of life, 4%; 
between the ages of 1 month and 6 months there is greater divergence, 20-25 % 
but, for children aged 6-12 months, the mortality experience in the county 
boroughs is only 7% worse than that in the rural districts. The comparison 
between the experience in town and country for diseases grouped under B is 
vastly different. The effects of an unhealthy environment are apparent in the 
first month of the child’s post-natal existence and they become more accentu- 
ated with age. The mortality amongst children under | month in the county 
boroughs is 56° higher than that in the rural districts, and for those aged 6-12 
months the excess is no less than 96°. In view of this disparity it is obvious 
that if further reduction in the national rate for infantile mortality is to be made 
the sanitarian or the public health administrator must concentrate his efforts 
to ameliorate the existing conditions in towns which help to produce the 
relatively high mortality from the diseases classified under Group B. 

It must not be concluded, however, that the effect of environment is harmful 
only in infancy. The emphasis which has often been laid on the difference 
between the infant death rate in town and country would seemingly suggest 
this conclusion. It would not be valid. The influence of environment in child- 
hood becomes progressively unfavourable and at age 2-3 it is at a maximum. 
This fact was clearly demonstrated by Brownlee. He selected certain life tables 
which related to the decennium 1891-1900 and which represented different 
environmental grades. He next expressed the mortality at ages in each as 
proportions of the rates at the corresponding ages in the Healthy District Life 
Table for the same period 1891-1900. We have chosen three life tables from 


his list : Healthy district = Good environment 
E6 (England and Wales) = Average 
Salford = Poor 


The values are given in Table I. 
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TABLE I 


Showing for the period 1891-1900 the ratio of the death rates in England and Wales 
(£6), and in Salford, to those of the healthy districts (H 3) at individual ages 
between 0 and 5 years 


] ] 
Ratio 
Death rates in 
Age healthy districts 
(H3) | England and Wales Salford 
| (6) 
| 
| Males Females Males | Females | Males | Females 
| oO | 132074 | 101-327 | 146 | 152 | 220 | 242 
ee 28-500 | 26-421 1-92 192 | 339 | 3-59 
| 2 10-100 10-355 208 1-96 400 | 3-67 
3 7386 7-093 1-80 1-89 311 | 3-14 
5-787 5657 | 168 | 1-70 263 | 321 | 


| 
| 


It will be seen that the maximum ratio in each instance is definitely at age 
2-3 years. 

To ascertain whether this phenomenon was a mere chance event or persisted 
before and after 1891—1900 we classified the registration counties into two groups, 
A, mainly urban, B, mainly rural, and calculated the appropriate ratios in the 
prescribed age periods. Subsequent to 1910 the ratios were based on 

mortality in county boroughs 
The results are given in Table II and are of interest. 
TABLE II 


Showing the ratio of the mortality at individual ages under 4 years (1) in urban 
counties to that in rural counties, and (2) in county boroughs to that in rural districts 


Ratio 
| | 

Areas Period | Males Females 
| | | 
| 0-1 | 12! 23 3-4 0-1 1-2 2-3 3-4 | 
5 
U.C./R.C. | 1881-90 | 174 | 190 | 185 | 13 178 | 188 | 187 


127 32 
1891-1900) 131 187 203 190 | 137 194 202 | 194 
1901-10 | 132 


C.B./R.D. 1911-14 | 


138 | 
1920-23 | 133 | 211 | 184 | — , 135 | 206 | 189 
1930-32 | 135 
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Between 1881 and 1910 the ratio for males was highest at age 2-3 years. In 
1911-14 a change occurred and the ratio at age 1-2 became the most important 
and has remained so up to the close of our experience. For females the regression 
did not take place until after the war. We can offer a more stringent test to show 
that the relative environmental influence of town and country on mortality is, at 
present, best indicated at age 1-2 years. In the Registrar-General’s Decennial 
Supplement, 1931, Part I, life tables were made for the aggregated county 
boroughs in Northumberland and Durham, which we will call C, and for a 
group of rural districts in the Eastern counties (D). The g, values of C expressed 
as proportions of D are: 


| 
Age in years | 


0 1 2 3 4 | 
Males C/D 1-66 3-77 3-03 } 2-67 2-19 
Females C/D 1-64 3-54 2-74 2-48 1-99 


These two areas C and D represent widely divergent types of environment and 
the maximum ratio is definitely at age 1-2 years. 

It is now important to discover the particular diseases of childhood initially 
responsible for the highest ratio being in certain years. We are able to do this 
because in the Decennial Supplement for 1901-10, pp. cexxiv—ccxxvi, the 
Registrar-General published the death rates from All Causes, specitic diseases 
and groups of diseases in infancy and in the individual years of childhood up to 
age 5 during the period 1906-10 for two groups of counties, the one urban in 
character, the other rural. The rates and the ratio of the urban to the rural 
mortality are given in Table ITI. 

It will be seen for All Causes of death that, although the ratio at age 2-3 
years is still the largest, 1-97, it is really not much in excess of the value at the 
younger age. This is not unexpected, because we previously indicated that after 
1911-14 the maximum index was definitely at age 1-2 years and the period 
1906-10 is seemingly the transitional one: at least it borders that in which the 
regression occurred. The specific groups of diseases for which the ratio was 
definitely highest at age 2-3 years were the common infections, tuberculous, 
developmental and wasting. 

Owing to the fact that deaths according to extent of urbanization and 
specific cause are no longer published for individual years of life between age 2 
and 5 years (the existing age classification being 0-1, 1-2, 2-5) we are unable to 
indicate the diseases which produced the change in the age occurrence of the 
maximum ratio. 

The statistics of the mortality in childhood which we have so far examined 
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TABLE 


Showing for the period 1906-10 the deaths per 1000 survivors (both sexes) 
at the commencement oj each year in England and Wales 


Ages | 

1-2 2-3 34 4-5 | 

I. Common infectious diseases Urban 11-78 5-87 4-11 2-98 | 

Rural 4-88 2-36 2-01 1-73 
Ratio U./R. 241 249 204 172 

Il. Diarrhoeal diseases Urban 5-03 0-83 0-29 0-16 | 

Rural 1-59 0-31 0-16 OW | 

Ratio U./R. 316 268 181 160) | 

Ilf. Developmental and wasting diseases Urban 1-05 0-22 0-08 0-04 | 

Rural 0-65 0-13 0-07 0-03 | 

Ratio U./R. 161 169 114 133 

| IV. Tuberculous diseases Urban 3°93 2-10 1-34 1-03 

Rural 2-14 1-03 0-71 0-66 | 
Ratio U./R. 184 204 189 156 
| V. Miscellaneous diseases Urban | 19-42 7-58 4-40 3°21 
Rural | 11-74 4-61 2-90 2-24 
Ratio U./R.| 165 | 164 152 195 
| VI. All causes Urban | 41-21 | 16-60 | 10-22 7-42 
Rural | 21-00 8-44 585 | 4-76 
| Ratio U./R. | 196 | 197 175 | 156 

| 


for town and country support the viewpoint that the differences can be explained 
on the basis of earlier infection in the former. The pre-school child, as a con- 
sequence of his environment, is infected at an age when he is least able to resist 
a fatal attack. In rural districts infection occurs at a later age. Picken (1921) 
demonstrated this fact in connexion with measles. He calculated at two periods 
—1891-—1902 and 1903—12—the mean age of attack inarural and in an urban com- 
munity and found on both occasions that the former had the higher mean age. 
MORTALITY AND TYPE OF DWELLING 

That environmental conditions influence the age mortality in this manner is 
evident in the statistics of Glasgow for the ye ws 1909-12. The population and 
deaths from specific diseases during that period were classified according to age 
and type of dwelling—whether one-, two-, three- or four-apartment houses. The 
final rates were published in the Medical Officer’s Annual Report for the year 
1912, and in Table IV we present in certain age periods the mortality from a 
group of infectious diseases. 

It will be observed that when the mortality at age 0-1 year in the one- 
apartment house is represented by 100, that in the four-apartment house is 36: 
at age 1-5 the difference is still more outstanding as the death rate in the best 
type of house is only 17 % of that in the presumably most overcrowded dwelling. 
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TABLE IV 


The male death rate from a group of infectious diseases* per 1000 
of the population according to the type of house in Glasgow, 1909-12 


Type of houses 0-1 % 1-5 % 5-15 | % 
One-apartment houses 49-14 100 19-19 100 1-96 100 
| Two-apartment houses 38-55 78 13-02 68 1-84 94 
| Three-apartment houses | 20-95 43 7-78 41 a | 2 
Four-apartment houses | 17-63 36 3-19 | 17 2-52 | 129 

| | 


* Diphtheria, scarlet fever, measles, whooping cough, diarrhoea and enteritis. 


In the next age period, 5-15, the sequence is completely changed as the mor- 
tality of well-housed children is now 29 % higher than that in the one-apartment 
house. Why has the trend of the mortality in this social range differed as between 
pre-school and school age? There is only one adequate explanation. The 
children in the worst environment—the one-apartment house—in addition to 
being possibly of a lower nutritional standard, had been infected in the pre- 
school life, age 0-5, with a resultant high mortality, whereas the children in the 


highest social class were not seriously exposed to infection until they attended 
school. 


OVERCROWDING AND MORTALITY IN LONDON BOROUGHS 


The effects of environment on health can be suitably studied in the boroughs 
of London. If we accept overcrowding, i.e. the percentage of the population 
living more than two in a room, with its physical, mental and economic implica- 
tions, as an indication of an unhealthy environment, then these areas represent 
a wide range of hygienic conditions. The range of overcrowding at the 1931 
census was from 4% in Hampstead to 29% in Shoreditch and Finsbury. To 
assess the extent to which the mortality is associated with environment we 
have correlated the death rates at age periods amongst females in each of the 
boroughs with the overcrowding index. We specifically selected females because 
they, and certainly the mothers, are more exposed to risks of their particular 
environment than are the males, who in all probability work outside it. 

The trend of the coefficients in Table V clearly indicates the necessity of 
taking age into consideration in any discussion on the influence of environment 
on health. The coefficient at age 0-1 is 0-405 + 0-158 but, in the next age group, 
it is no less than 0-813 + 0-064. This latter value is the second largest coefficient 
in the series, and it confirms our previous discovery that, in childhood, the age 
1-2 years is now the most responsive to hygienic conditions. After this age the 
coefficient becomes progressively smaller and is at a minimum, 0-334 + 0-168, 
at age 4-5 years. The upward trend begins again, but it is slow at first. It 
becomes more defined as middle age is reached and culminates in the highest 
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TABLE V 


The correlation coefficients between the female mortality from all causes 
and overcrowding in ihe London boroughs, 1929-33 


| 
Ages iz r and S.B. 
0- 1 0-405 + 0-158 
1- 2 0-813 + 0-064 
2-3 0-522 + 0-137 
| 3- 4 0-396 + 0-159 
| 4-5 0-334 + 0-168 
5-15 0-356 + 0-165 
15-25 0-362 + 0-164 
25-45 0-518 + 0-138 
45-65 0-910 + 0-032 
: 65-75 0-794 + 0-069 
75+ 0-650 + 0-109 


peak value of 0-910+0-032 at age 45-65. We thus see that environmental 
influence on mortality is strongest at two periods of life, at age 1-2 and at age 
45-65. Its occurrence at these ages as revealed by the statistics for London is 
not a mere chance happening. It is also characteristic of the statistics of other 
places. We have previously seen it demonstrated at the younger age by the 
ratios of the mortality of the county boroughs in Northumberland and Durham 
to that of the rural districts in Eastern England while, at the older age, Brownlee, 
as will be seen below, using a narrower age limit than 45-65, indicated its 
existence in the age group 45-50, when he expressed the death rate at this age 
aH in Salford and in E6 (England and Wales), respectively, as a ratio of that in 

vs H3 (Healthy Districts). 


Death rate Ratios 
| 
| Age | H3 (1891-1900) | E6 (1891-1900) | Salford (1891-1900) 
35-40 626 | 576 | 144 | 1:36 | 159 | 1-90 
40-45 | 7-57 682 | 1-58 1:47 | 2651 | 2-14 
45-50 | 9-32 7:83 160 | 41:50 | 267 | 241 
eS | 50-55 | 1255 | 1022 | 1:56 1:47 | 242 | 2-39 


The smallness of the correlation coefficients between the ages of 20 and 35 
years is rather surprising in view of the fact that at this period of life tuber- 
culosis is the most important cause of death and its incidence is higher in the 
slums than in residential districts. Hence it may well be asked: why should the 
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correlation between mortality and bad social conditions be more manifest at 
ages 1-2 and 45-65 years than at any others and what specific diseases were 
unduly affected? As far as the younger age is concerned we have previously 
incriminated the infectious group. At the older age period we are probably not 
witnessing any intensification in the effects of environment on the individual, 
but rather the result of accumulated strain of having long endured conditions 
of living which were deleterious to health. The strain would inevitably be most 
manifested in middle life—the period when the physiological! mechanism of 
women is most disturbed. To obtain some idea of the diseases responsible we 
abstracted the important specific causes of death at this age, 45-65, and corre- 
lated their death rates in the various boroughs with the corresponding over- 
crowding values. The results were as follows: 


Values of r and s.r. 


Respiratory tuberculosis and overcrowding 0-688 + 0-100 
Other respiratory diseases and overcrowding 0-837 + 0-056 
Cancer and overcrowding 0-357 + 0-153 
Circulatory diseases and overcrowding 0-803 + 0-067 
Other diseases and overcrowding 0-731 + 0-088 


The correlation with cancer is small, but it is high with the other diseases and 
suggests that bad hygienic conditions are associated with general ill-health rather 
than with one specific cause. 

Before concluding the mortality aspect of our investigation one particular 
point needs some explanation. We have previously declared that the relationship 
between density and mortality is best described by a curvilinear equation 

m=cd", 
and our subsequent adoption of the correlation coefficient which implies 
linearity seems rather illogical. It really is not so. Hitherto we were solely 
concerned with describing the association between density and the total mor- 
tality at all ages. But even if the relationship was non-linear at each specific age, 
we could deduce from the slopes of the best fitting straight lines the age period 
in which the association between the two variables was the most defined. 


MORBIDITY FROM INFECTIOUS DISEASE AND ENVIRONMENT 


The complete effects of environment on health are inadequately expressed 
when measured in terms of mortality because there may be a considerable 
amount of sickness in a population, yet the patients may not die. There are 
many diseases for which the number of deaths or the death rate is no criterion 
of the general prevalence. Scarlet fever is a classic example. The incidence of 
that disease—the number of notified cases as a ratio of the number of the 
population exposed to risk—at age 0-15 years is practically as high now as it 
was thirty or forty years ago, but the killing power of the disease is not nearly 
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so intense. We thus have a picture of a low mortality accompanying a high | 
morbidity. Even the adoption of case rates as an accurate index of prevalence 
is in a sense insufficient, because we cannot be certain of either complete 
notification or correctness in diagnosis of the notified cases. In London, the 
diagnostic error for scarlet fever is approximately 10% of the cases admitted 
to hospital. Roughly a quarter of the cases sent to hospital as diphtheria are 
suffering from something else—tonsillitis or laryngitis; while for enteric the 
error is in the neighbourhood of 30%. But despite the inaccuracy with which 
the case rates of infectious disease are invested, they nevertheless convey a more 
complete picture of the presence of infection in the general population than is 
possible from a study of the mortality. Hence it is reasonable to suppose that 
the influence of environment would be more clearly demonstrated with mor- 
bidity than with mortality, assuming that adverse environmental conditions, 
as expressible in terms of overcrowding, are deleterious to health. 

Nowhere, it would seem, can the relationship between morbidity and un- 
healthy conditions of living be better examined than in the boroughs or sanitary 
divisions of a large city, because these areas possess a homogeneity which is not 
so apparent in the different sections of the whole community. If our supposition 
is correct, and a priori it seems reasonable, that bad environmental conditions 
are inimical to health, then we should expect to find fairly high positive correla- 
tion between the variables in question. The association cannot be perfect, 
because where there is a high concentration of density there will inevitably be 
some degree of immunity against infection acquired by sub-clinical attack. 

Although we have suggested that the relationship between density and 
disease is best measured within the sanitary or administrative subdivisions of a 
city it does not follow that the association will be equally, or even approximately, 
the same for different cities or for different infections in the same city. The 
correlati: between overcrowding and the case rate at age 0-15 years for scarlet 
fever in Glasgow and in London supports this viewpoint: 

Scarlet fever and overcrowding 


| Period | London Period Glasgow | 

| 

1901-10 | r=+0-135+0-186 1899-1902 | r=—0-8604+0045 | 
1911-14 r = —0-353 + 0-165 1903-08 r = —0-861 40-054 

| 1919-23 r = —0-064+0-188 1909-13 r= —0-663+0-120 | 


In London the prevalence of scarlet fever is little influenced by social 
conditions as the coefficients are very small. If there be a relationship it is of a 
slightly inverse character, as in two out of three instances the coefficients are 
negative. On the other hand, in Glasgow there is a well-marked tendency for 
residential districts to have relatively more scarlet fever than the poorer areas. 
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A plausible explanation of this phenomenon in Glasgow may be the greater 
immunization by minimum dosage in the slums than in the better class districts. 
But why the differentiation between the two cities? Possibly the type of housing 
in Glasgow is the responsible factor. There is a great difference between the 


housing systems in the two cities, as is apparent from the following facts obtained 
at the 1921 Census. 


Percentage of population 


| | 
| No. of | 
rooms | | 
London Glasgow | London | Glasgow 
| @ | (b) (c) 
—| | | 
| = 
6-2 | 13-2 | 085 | 
| O64 | O42 
| 3 | 23381802 | | 078 | 063 
| 63 090 0-85 

5 | 118) | 3-0. | 1-04 | 1-15 


The distribution of the population in London according to the number of 
rooms occupied is fairly symmetrical, whereas in Glasgow the curve of incidence 
is rather skew. We find that 80-2°% of the total population of London and 
94-8 °% of that of Glasgow lived in homes containing one to five rooms. The 
disproportion was more strongly marked at the bottom end of the scale. In 
London, 6-2 and 17-5% of the population lived in homes of one and two 
rooms: the corresponding proportions in Glasgow were 13-2 and 51-5 %. 

These figures are, in themselves, not necessarily indicative of overcrowding, 
because the smaller proportions of the population in London—6-2 and 17-5 %— 
could be composed of families containing one or more members, whereas the 
constitution in Glasgow could be that of individual members. (According to 
Census regulations, a lodger occupying part of a house or flat is treated as a 
separate family.) But when the data in cols. (b) and (c) are supplemented by 
those in cols. (d) and (e) they demonstrate clearly the unsatisfactory position 
of Glasgow. The range between bad and good conditions is more accentuated 
in Glasgow than in London; the room space per person extends from 0-31 
in the one-room house to 1-15 in the five-room house in Glasgow, the comparable 
values in London being 0-56 and 1-04. Amongst the sections of the total popula- 
tion living in one and two rooms in Glasgow the room space per person was 44 
and 35°, respectively less than in London. Arising out of this greater con- 
centration or massing of the population in tenement dwellings with deficient 
room space per person there will inevitably be greater opportunity in Glasgow 
than in London of acquiring immunity to the disease. 

On the other hand it may be suggested that the differences between the 
correlation values for the two cities, and especially the large negative coefficient 
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for Glasgow, may be due to variation in the standard of notification in the two 
cities. If there exists for the Glasgow slum children incomplete notification of 
the disease in the pre-school stage, more complete notification at the school age, 
and, in the residential areas, complete notification for all children, then, on this 
hypothesis, there will be a negative correlation between incidence and over- 
crowding. Brownlee was of the opinion that scarlet fever in Glasgow was a milk- 
borne infection, and as children in the residential areas consumed relatively more 
milk than the children in the poorer districts the higher incidence of scarlet 
fever in the former followed as a consequence. Hence we have three possible 
explanations and there may be others, but in the light of the abnormal type of 
housing in Glasgow we are inclined to adhere to the opinion that the pheno- 
menon is best explained by the greater degree of latent immunity amongst 
those children who live under bad environmental conditions. This view is 
reinforced by the fact that the diphtherial experience in Glasgow and in London 
respectively is practically identical with that for scarlet fever. 

Although the type of environment is inversely related to the incidence of 
infections such as scarlet fever and diphtheria, yet when it is correlated with the 
fatality arising from those infections the association is highly positive, as will 
be seen from the following values for Glasgow: 


Period r and S.E. | 

| 
1899-1902 0-586 + 0-114 
1903-08 0-789 + 0-079 
1909-13 0-727 + 0-096 | 


1921-25 | 0-619 + 0-092 


There is nothing abstruse in the interpretation of these values. The slum child 
in an overcrowded home with implications of earlier age of infection and 
malnutrition is less able than the child in the higher social grades to resist a 
fatal attack. 

There are other notifiable diseases the contraction of which as far as is 
ae known confers no subsequent absolute immunity, but which are definitely 
is related to prevailing hygienic conditions. Erysipelas exemplifies this class and 
its statistical experience in Glasgow and in London at different periods is as 


follows: 
Erysipelas and overcrowding 

Period London Period | Glasgow | 

1901-10 | r=+0-834+0-058 1899-1902 r=+051540-128 | 
| 1911-14 | r=+0-641+0-111 1903-08 r = +0-698 + 0-105 
1919-23 r= +0-°745 + 0-084 1909-13 | r= +0-713+0-103 | 
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The high positive correlation is probably due to the fact that in congested areas 
there is greater likelihood of abrasions being followed by a supervening infection 
with the streptococcus erysipelatis. 

Our next aim was the presentation of a general picture of the relationship 
between the morbidity from infectious disease and the range of environmental 
conditions in the different administrative areas of England and Wales. Accord- 
ingly we abstracted the recorded number of notified cases of scarlet fever, 
diphtheria, enteric and erysipelas for the period 1921-30 in the London boroughs, 
each county borough and each urban and rural district and correlated the case 
rates with the mean value of the corresponding overcrowing index as recorded 
at the 1921 and 1931 censuses. The coefficients obtained for the different areas 
according to their geographical location are given in Table VI. 

We are mindful of the fact that the values obtained are influenced by one 
important consideration—we were unable to make any allowance for variation 
in the standard of notification in the different areas. This factor may not be of 
serious import within the administrative areas such as the county boroughs 
because the percentage of all cases notified may not vary very much from city 
to city. In all probability it will affect comparison made between the adminis- 
trative areas such as county boroughs and rural areas, as it is most unlikely 
that the standard of notification is the same in town and country. Many of the 
values in the table are statistically unimportant, but there are points of interest 
—the chief of which is the relationship between the incidence of erysipelas and 
environment. There is a positive correlation between the two variables in every 
part of the country but it diminishes with decreasing urbanization. In London 
the coefficient was 0-665 + 0-103: in the aggregate rural districts 0-330 + 0-039. 
The incidence of scarlet fever in London is positively but insignificantly related to 
environment—as in past experience—but that of diphtheria is much more 
definite, r being 0-363 + 0-161. It will be noted that the statistical experience 
of these two diseases in the rural districts resembles that of London inasmuch 
as the association is positive. The relationship exhibited in the county 
boroughs, particularly in the North, is entirely different from that elsewhere. 
The association is negative, resembling in this respect that of Glasgow. The 
concordance is not surprising, as the environmental conditions of the 
northern towns, in all probability, are as unsatisfactory as those of Glasgow 
and the opportunity of acquiring immunity to either disease by a sub-clinical 
attack is probably just as readily obtained. 

The association with enteric is rather vague as, apart from that for the urban 
districts in Wales, not one of the coefficients is statistically important and 
the signs are almost evenly distributed. In the component regional divisions 
the negative sign occurs six times and the positive five times. In view of the 
sporadic occurrence of this disease and the different channels by which the 
infection may be conveyed, particularly by “carrier”, the indefinite relationship 
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between the variables as exhibited by the series of coefficients in the table is, 
in a sense, not surprising. 


CONCLUSIONS 


The points of interest and the conclusions warrantable from this study are: 

1. The relationship between general mortality and density of population or 

overcrowding is appropriately described by an equation of the character used 
by Farr: Mortality = C. Density“. 
For a certain range of density there is a corresponding increase in the mortality, 
but a saturation level is reached when for further increase in density there is no 
accompanying increase in mortality. The statistics of overcrowding in the 
London boroughs when plotted against the corresponding standardized death 
rates for the period 1930-2 reveal a definite curvilinear relationship (Diagram 
on p. 346). 

2. The effects of a bad environment on health are particularly noticeable 
at two periods of life—in childhood and in middle life. 

Accepting the mortality in town and in country, respectively, as representing 
indices of two widely different environments it was found that during childhood 
the greatest divergence occurred at age 2—3 years. This was a characteristic feature 
until 1911. Afterwards there was a transition and age 1-2 years takes priority 
(Tables I and II). This fact is further confirmed by the correlation coefficients 
between overcrowding and age mortality in London, as the highest coefficients 
were those at age 1-2 years and at age 45-65 years (Table V). 

The diseases responsible for the initial manifestation (the high ratio at 2-3 
years) were mainly those in the infectious group (Table III). At the older age 
period it would appear that bad hygienic conditions are associated with general 
ill-health rather than with one specific cause. 

3. Type of dwelling is highly related to the mortality from infectious 
disease, as is observable in the statistics for Glasgow (Table IV). Children living 
in single-room dwellings have a very high mortality in pre-school life as com- 
pared with children living in larger sized homes. A probable explanation is that 
they are sooner exposed to infection and less able to withstand a fatal attack. 
Children in the better class house get infection when they come to school, as 
is evident from their higher mortality at age 5-15 years. 

4. Morbidity is a better index of environmental influence than mortality 
because, for diseases such as scarlet fever, mortality is no criterion of its pre- 
valence. The incidence of scarlet fever and diphtheria is negatively correlated 
with overcrowding in Glasgow but not in London. The difference or distinction 
may be due to a possibly greater degree of latent immunity amongst Glasgow 
children as a consequence of the unique housing conditions in that city. 

5. The response of infection to environment differs considerably both for 
type and location of administrative area (‘Table VI). Undoubtedly some of the 
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difference between the coefficients for town and country areaz is due to varying 
standards in the notification of infectious disease. The relationship between 
overcrowding and the incidence of scarlet fever and diphtheria in the county 
boroughs of the North is an inverse one—similar to that in Glasgow and probably 
capable of a similar explanation. 

Erysipelas is an instance of a particular disease definitely associated with 
hygienic conditions, as the correlation coefficients are positive in all parts of 
the country. The correlation between environment and enteric is very indefinite, 
but this is probably due to the many factors which can be responsible for the 
spread of this particular infection. 
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ON SENTENCE-LENGTH AS A STATISTICAL CHARACTER- 
ISTIC OF STYLE IN PROSE: WITH APPLICATION TO TWO 
CASES OF DISPUTED AUTHORSHIP 


By G. UDNY YULE 


Section I. Lyrropuctory 


ONE element of style which seems to be characteristic of an author, in so far as 
can be judged from general impressions, is the length of his sentences. This 
author develops his thought in long, complex and wandering periods: that finds 
sufficient for his purpose a sequence of sentences that are brief, clear and per- 
spicuous. Since the length of a sentence can be readily measured, for practical 
purposes, by the number of words, it occurred to me that it would be of interest 
to subject this impression to statistical investigation. 

In carrying out the investigation, I met with more difficulties than I had 
foreseen. There are two terms used above: (1) Sentence, (2) Word. What is a 
sentence? What is a word, or what for present purposes is to be regarded as a 
word? 

Sentence. Let me cite the New English Dictionary: 

SENTENCE. sb. 6. A series of words in connected speech or writing, forming the 
grammatically complete expression of a single thought; in popular use often (= Period sb. 10) 
such a portion of a composition or utterance as extends from one full stop to another. In 
Grammar, the verbal expression of a proposition, question, command, or request, containing 
normally a subject and a predicate (though either of these may be omitted by ellipsis). In 
grammatical use, though not in popular language, a sentence may consist of a single 
word....English grammarians usually recognise three classes: simple sentences, complex 
sentences (which contain one or more subordinate clauses), and compound sentences (which 
have more than one subject or predicate). 

From these definitions I conclude, I hope rightly, that we may drop the term 
“period” and use the term “‘sentence” to cover any sentence (or as I should 
have been inclined to write “ period”’), however complex and however compound 
in the senses defined. It is convenient to be able to avoid a term which to a 
statistician would generally suggest a different meaning. Now, not being a 
grammarian but just one of the populace, I confess that I started with the 
popular notion of a “sentence” in this general sense: “such a portion of a 
composition as extends from one full stop to another”, and thought I would 
have nothing to do but tot up the words from full stop to full stop. The first 
definition, however, reads: “the grammatically complete expression of a single 
thought.”’ I feel some doubts as to the “single thought”. (Is not “I am tired 
and hungry” a sentence, and does it not convey two thoughts, the thought of 
being tired and the thought of being hungry?) But the “grammatically complete 
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expression ”’ surely is essential to make a word-series a sentence; the word-series 
must be what Webster calls a “sense unit”’, and the trouble is that, especially in 
older works, “‘a portion of a composition” which “extends from one full stop to 
another”’ is often not the grammatically complete expression of anything. When 
the author or compositor has used punctuation in this fashion it is no longer 
possible simply to add up words from one full stop to the next, paying little or 
no attention to sense: it is necessary for the reader frequently to pull up and ask 
himself if the words just read do or do not form a sentence, and if they do not, 
what are in fact the limits of the sentence within which they must be assumed to 
lie. I need hardly point out how much this increases labour, and even, if the 
sentences are very long and complicated, brings in largely the element of personal 
judgement. Two readers, at least unskilled readers like myself, may well differ 
as to where a given sentence terminates. 

Here is quite a simple illustration of the difficulty from a modern essay on 
The Politics of Burns (ref. 1, at end of paper): 


There are several points here all at once calling for notice, and seldom getting it from 
friends of the poet: 


The extraordinary talent for history shown by Robert Burns. 
His attention to British History in preference to Scottish. 
The originality of his views. 


In this passage there are four word-series, the first divided from the second 
only by a colon (though the second begins with a capital letter), the second 
divided from the third, and the third from the fourth, by full stops. But neither 
the second, nor the third, nor the fourth word-series is a grammatically complete 
expression. The whole passage must be taken together, as it seems to me, as one 
single sentence. I am of course simply illustrating my difficulty, not criticizing 
the punctuation. 

On the other hand, where an author has written a very long and meandering 
sentence, a question may well arise between two different readers as to whether a 
halt should not be called in the middle, and a full stop entered where author or 
compositor has placed only a colon. 

I say author or compositor, for it must not be assumed that one is necessarily 
laying sacrilegious hands on the deliberate construction of the author himself. 
“So far as punctuation is concerned,” says McKerrow (ref. 2), ‘there seems very 
little evidence that many authors exercised any care about it whatever. After 
all, even at present, few authors trouble to punctuate their MSS. with any care 
or consistency. Such punctuation as is found in ordinary MSS. of the sixteenth 
and seventeenth centuries is indeed most erratic and seldom goes beyond full 
stops at the end of most of the sentences and some indication of the caesura in 
verse.” I had, before I started the present work, expected that this comment 
would apply much more to intermediate punctuation than to full stops, trusting 
that authors would at least insert “ full stops at the end of most of their sentences”. 
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But that it applies to both was enforced on me by different versions of the short 
tract by Gerson, De Meditatione Cordis, in the edition of his complete works that 
I used (see below section III and ref. 9) and in four editions of the Imitatio 
Christi on my shelves. The versions differed, not only verbally, but also as regards 
full stops. If punctuation, even as regards full stops, is largely the work of the 
compositor, there need be no hesitation in overriding them if necessary: indeed, 
the use of personal judgement seems unavoidable. 

Let meadd that at first I byno means realized the full extent of this difficulty, 
and when I did often felt myself horribly incompetent to deal with it. I am sure 
my final decisions could often be contested, and were not infrequently in- 
consistent with one another. But after all difficult cases are but a small propor- 
tion of all sentences in most writers and, if only as an exploratory piece of work, 
I hope the investigation may still retain interest and value. 

Word. Compared with the difficulties as to the sentence, the difficulties 
concerning words are really of a minor kind. One large class is indicated by the 
lines of Calverley : 

Forever; *tis a single word! 
Our rude forefathers deemed it two: 


Can you imagine so absurd 
A view? 


Our rude forefathers also wrote # self, any where, every where and so forth, where 
their rude descendants write dself, anywhere, everywhere. How shall we reckon 
such expressions? It is best, I think, to follow modern usage and I generally 
endeavoured to do so; but in ‘apid counting it is very easy to make a slip. 
Hyphened words present the same sort of difficulty. Law-courts, china-manu- 
facturer, news-journal, well-earned, | would count as two words each; out-of-the- 
way as four: but co-acervation, contra-distinguish, tri-syllabic, pre-disposed, re- 
produce, as one each. A something-nothing-every-thing (Coleridge) presents a 
special problem: I think it should be three words. But how many words is 
matter-of-factness? Coleridge calls it a word, “an uncouth and new coined 
word”’. 

Then there are abbreviations such as viz., t.e., etc. or &c. The first there is 
no reason to reckon as anything but one word. The second, third and fourth, in 
spite of their meaning, I also reckoned as one each: eye and mind grasp them as 
wholes. 

Finally, what are we to do with figures? Dates may occur even in literary or 
historical essays: any year stated in figures (1825 or 1798) I reckoned as a word. 
Whether days of the month ever occurred I do not recall: but I would reckon 
the day of the month stated in figures, as in January 10th, as a word for the 
month and a word for the number of the day. Any actual number if stated in 
figures, and such numbers are frequent of course in the work of Graunt and 
Petty that I have discussed, would be reckoned as one word whatever the 


Biometrika xxx 24 


366 Sentence-Length as a Statistical Characteristic 


number. Thus 251 would be reckoned as one word and so would 3,251,452; 
although two hundred and fifty one would be five words, and three million, two 
hundred and fifty one thousand, four hundred and fifty two would be thirteen. 
This may seem arbitrary: but again, if the number is stated in figures eye and 
mind grasp it as a whole, while if in words it has to be taken word by word. For 
the same reason, fractions such as ? or 3, which are also frequent in Graunt and 
Petty, were reckoned as a word each. Sums of money stated in figures, such as 
£1. 2s. 8d. were to the best of my recollection treated as if pounds, shillings and 
pence were so expressed in words—not very consistently with the principle 
stated above. If any matter was so full of figures that it practically ceased to be 
prose even in the humblest sense of that term, if for example it was set out in 
tabular or semi-tabular form, it was simply cut out. 

In all such instances as the above I really do not think it is of very much 
practical consequence what rule is adopted: nor even of much practical con- 
sequence if the treatment is not always self-consistent. Sentences vary too much 
in length for what are after all minor errors of measurement to be of much 
consequence. 

Quotations. 1 may mention in conclusion one other difficulty. What is to be 
done with quotations? Two cases seem clear. If the author makes a brief 
quotation forming grammatically part of his own sentence, he is only substi- 
tuting someone else’s words for his own and they must be counted in: as in 


Lamb’s 
But I am none of those who— 
Welcome the coming, speed the parting guest. 


If, on the other hand, the author simply quotes a complete sentence from 
somebody else, that is not the author’s writing and must be omitted: as for 
example when the same author writes 
A gag-eater in our time was equivalent to a goul, and held in equal detestation. — 
suffered under the imputation. 
—Twas said 
He ate strange flesh. 


The quotation must be dropped. But no rule can be applied strictly to living 
literature. Thomas & Kempis, for example, quotes the words of scripture so 
freely that if one cut out scriptural quotations one would eliminate a consider- 
able proportion of his work. He has made scripture his own, and what he has 
written must stand as his. 

A serious difficulty arises only when, say, an essayist is discussing a poet and 
makes a long and purely illustrative quotation. This may be of any length, and 
it may be so made as virtually to form part of the sentence of the critic himself, 
or may follow almost indifferently a colon or a full stop at the end of the critic’s 
sentence. Quotations made in the first way, and even those made in the second 
way after a colon, I tended at first to include, But, on coming across very long 
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quotations, it became obvious that this was unsatisfactory, and I then adopted 
the easier method of simply cutting out all pages on which this source of trouble 
was serious. This is, I think, the best course. 


Section II. From Bacon, CoLERIDGE, LAMB 
AND MACAULAY 


This section is in part purely illustrative, showing what sort of distributions 
of sentence-length we may expect, but in part is concerned with the fundamental 
question, how far sentence-length is really a characteristic of an author’s style. 
If, that is to say, we take two lengthy passages, each containing a few hundred 
sentences, from a given fairly homogeneous work, will they present us with 
proportional numbers of sentences of each particular length in reasonably close 
agreement with one another? If they do not; if, although dealing with the same 
sort of material in the same sort of way, the author is liable capriciously to vary 
in the length of his sentences, sentence-length is not a characteristic of his style 
in any proper sense of the term, and one’s impression to the contrary will be 
proved mistaken. If, however, there is reasonably close agreement, we can 
accept sentence-length as a characteristic. It is necessary, I think, to insert the 
condition that the author shall be dealing with the same sort of material in the 
same sort of way, since (again judging from general impressions) it seems clear 
that sentence-length may be affected by the author’s matter as well as by his 
individuality: argumentative passages, for example, may well tend to longer 
sentences than matter purely descriptive.* 

The four authors chosen as illustrations are Bacon, Coleridge, Lamb and 
Macaulay; and their works, Bacon’s Essays, Coleridge’s Biographia Literaria, 
Lamb’s Elia and Last Essays of Elia, and Macaulay’s Essays. The particular 
editions used are not probably of any importance in this instance but are cited 
in the references at the end of the paper. They were simply those that I happened 
to have on my shelves. 

The fundamental tables, all in the same form and showing the numbers of 
sentences with 1 to 5, 6 to 10, 11 to 15 words, and so on, are given in the 
Appendix. 

Table A gives the data derived from Bacon’s Essays. Here, when I had got 
to the end of Essay X XVI, “Of Seeming Wise’’, I judged myself to be about 
half-way, and called this batch of 462 sentences sample A: I then proceeded 
to the end of Essay LI, “Of Faction”, and as this had given me 474 sentences, 
or approximately the same number, | called it sample B. The total number 
of essays being 58, the two samples together cover almost 90% of the 
essays. Table A shows, in addition to the distributions for the two samples 


* Compare, for example, in Hazlitt’s Lectures on the English Comic Writers, the style of the first 
essay “On Wit and Humour” with that of the subsequent lectures on definite groups of writers. 
See also below, section IV, for some remarks on Petty. 
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A and B, the total distribution for the two together. From inspection it will be 
clear that the two samples are very concordant, though figures are inevitably 
slightly irregular and fluctuating. In both the frequencies increase rather 
abruptly in the interval 11-15; in both they reach a maximum in the interval 
31-35, and then tail away very slowly indeed, so that there is a considerable 
number of sentences of 101-200 words in length and a few over 200. The record is 
a sentence of 311 words, as punctuated, i.e. from full stop to full stop. The reader 
will find it in the penultimate paragraph of Essay X XVII, “Of Friendship”’. 
It might well be broken up: but I do not think at this early stage I had attempted 
any revision of punctuation, hardly having realized the difficulty mentioned in 
the preceding section. 

Table B gives the data from Coleridge’s Biographia Literaria. I began at the 
beginning and continued to about the middle of chapter rx, when I had a batch 
of just over 600 (actually 601) sentences, which I judged sufficient: this is 
sample A. For sample B I meant to take a similar batch from near the end and 
began with chapter xx in vol. 11, not noticing that a great part of the remainder 
of this volume consisted of “‘Satyrane’s Letters’’. The result was that chapter xx 
to the end gave me only about half the number of sentences wanted, and to 
complete the sample I went back to the beginning of the volume (chapter xrv) 
and worked on from that point to about the middle of chapter xvi. This gave 
me sample B of 606 sentences. Again, inspection of the table shows that the 
distributions for samples A and B are closely alike and somewhat different from 
those of Table A. The actual maximum frequency occurs earlier, at 26-30 for 
sample A, and 21-25 both for sample B and for the two samples together; and 
the distribution is less scattered, there being a smaller proportion of the very 
long sentences of over 100 words in length. With Biographia Literaria the 
quotation difficulty became at times acute: a page or two, or a shorter passage, 
was omitted here and there to evade it. 

The data derived from Lamb’s essays are given in Table C. Sample A was 
taken from Elia (1st edition, 1823), from the beginning to some two-thirds of the 
way through “ Mrs Battle’s Opinions on Whist”’. Sample B was drawn from the 
middle of the Last Essays of Elia (1st edition, 1833), starting with the essay 
“Detached Thoughts on Books”’ and continuing to the end of “ Barbara S—’’. 
Once more, the general consistence of the two samples looks quite satisfactory. 
Short sentences are much more frequent than with Coleridge, and the greatest 
frequencies occur in the intervals 6-10 and 10-15, which are almost equally 
frequent. 

Finally, in Table D we have the data from Macaulay's Essays. Sample A was 
taken from the beginning of the essay entitled ‘Lord Bacon” (1837): sample B 
from the beginning of the essay on the Earl of Chatham (1844). In this instance 
the two samples do not agree quite so well as in previous tables. The first three 
frequencies are quite concordant and agree in placing the maximum frequency 
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at sentences of 11-15 words. But thereafter the frequencies of sample B exceed 
those of sample A right up to the interval 46-50, after which the position is 
reversed, so that the second sample is less scattered than the first. But the 
difference is not great. 

So far we have dealt only with the similarities and differences suggested by 
brief inspection of the tables, but it is desirable to summarize in terms of statistical 
measures. Distributions of this kind, with long tails in which rather wild outliers 
may occur, might, it seemed to me, be best dealt with by the method of per- 
centiles. While therefore I have calculated the arithmetic means as the most 
familiar form of average, I have also given the median, and for the rest have 
contented myself with the lower and upper quartiles Q, and Q,, the interquartile 
range Q,—Q, as a measure of dispersion, and the ninth decile D, as an index to 
the extension of the tail of the distribution. These percentiles are calculated on 
the usual convention that the intervals may be regarded as 0-5—5-5, 5-5-10-5, 
10-5-15-5, etc., and the distribution treated as continuous.* 

These constants, for Tables A—D, are given in Table I. The table brings out 
very well the degree of consistence of each author with himself, and his differ- 
ences from the others. For samples A and B of Bacon, mean, median, lower 
quartile and interquartile range agree within less than a unit, upper quartiles 
differ by 1-5 units and ninth deciles by 2-4, no very great difference from the 
practical standpoint especially in the constants most affected by fluctuations of 
sampling. For Coleridge, the two samples differ by between 1 and 2 units in the 
case of mean, median and lower quartile; the upper quartiles differ by 3-3, 
the interquartile ranges by 2-1 and the ninth deciles by 4-2. For Lamb the 
differences are less than a unit in the case of mean, upper quartile and inter- 
quartile range, the difference is exactly a unit for the two lower quartiles, 1-3 
units for the medians, and 3-6 units for the ninth deciles. For Macaulay the 


* As offprints at least of this paper may fall into the hands of some who are not statisticians, I 
may be forgiven for a note of explanation. The arithmetic mean is the common form of average, the 
sum of the quantities to be averaged divided by their number. Given a frequency distribution, it 
is calculated on the assumption that all observations falling into any one interval have the mid- 
value of that interval, e.g. that all sentences in the interval 6-10 are cight words long: this gives 
quite a close approximation. The lower quartile is the sentence-length such that one quarter of all 
sentences are shorter and three quarters longer. But sentence-lengths are discontinuous: sentences 
of 25 words or less might be less than a quarter of the whole, sentences of 26 words or less more than 
a quarter; hence some convention is necessary if a precise value is to be stated. The convention is 
that given in text above, and we proceed by simple interpolation. Thus in the total distribution of 
Table A the total number of sentences is 936, one quarter of which is 234. The first four frequencies 
up to and including sentences of 25 words, or up to the conventional limit 25-5, give a total of 212, 
and accordingly we require 22 more. There are 85 in the next interval, which is an interval of five 
words, and the lower quartile is therefore approximately 


99 


25-5+ x5 = 26-8. 


85 


The upper quartile, the value exceeded by only one-quarter of the observations, and the ninth 
decile, the value exceeded by only one-tenth, are similarly determined. 
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TABLE I 


Constants for the distributions of sentence-length in samples from Bacon, Coleridge, 
Lamb and Macaulay (Tables A, B, C and D of Appendix). Q, = Lower 
Quartile, Q, = Upper Quartile, D, = Ninth Decile) 


Bacon | Coleridge | 
| A B Total | A B | Total | 

Mean =| 48-4 48-5 48:5 | 41-2 39:5 | | 
Median | 39-4 39-4 39-4 35-7 342 | 349 | 
Q | 27-2 26-4 268 | 229 218 | 223 | 
617 60-2 60-9 | 532 99 | 513 | 
| 345 33-8 341 | 30:3 281 | 290 | 
D, | 80-5 91-9 91-0 | 45 | 703 | 731 | 

| | 

Lamb Macaulay 

| | | | | 

A | B Total 

Mean | 262 | 263 | 262 | 228 | 214 | 221 
Median 183 | 196 | 191 | 182 | 189 | 186 
105 | 15 | 110 | 15 | 120 11-7 
Qs 33-3 | 340 33-7 | 28-2 27-5 27-8 
Q,—-Q, 228 | 225 | 227 | 167 15-5 16-1 
D, Tee. 5 | 539 | 549 | 442 39:1 | 406 


constants seem almost more self-consistent than inspection of the table would 
lead one to expect. The differences are, for means 1-4, medians 0-7, lower 
quartiles 0-5, upper quartiles 0-7, interquartile ranges 1-2, ninth deciles 5-1: the 
lessening of the scatter has affected mainly the ninth decile. For Coleridge all the 
constants given are lower than the corresponding constants for Bacon, the 
differences being most conspicuous for the upper quartile and the ninth decile. 
Comparing Lamb and Macaulay, medians and lower quartiles are much the 
same, but Macaulay’s mean, upper quartile, interquartile range and ninth decile 
are appreciably lower than the corresponding figures for Lamb. 

We may conclude accordingly that sentence-length és a characteristic of an 
author’s style. There is no discrepancy between the results of our statistical 
investigation and the judgement made from general impressions. Given similar 
material and mode of treatment, an author’s frequency distribution of sentence- 
lengths does remain constant within fairly narrow limits. At the same time, it 
must be admitted, the limits cannot be precisely defined. In case of dispute as 
to whether two works are or are not by the same author, a judgement based on 
frequency distributions of sentence-lengths for the two must in the end be a 
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personal one, and founded on such differences as are observed between samples 
from works known to be by the same author. Hence the importance of the 
illustrations that have been given. 

The test is numerical, but not exact. For there can be no question of 
applying the ordinary tests based on the theory of simple sampling. The 
“samples”’ we have taken are in no sense random samples: they are continuous 
passages, or collections of continuous passages, and if (as was my practice) 
the lengths of sentences are written down in order as they occur it is very clear 
that the resulting numerical series is not a random series but a “clumped” 
series. Short sentences tend to occur together. The tendency is much clearer 
for some authors than for others and for Macaulay is a characteristic trick of 
style, a point being emphasized by a series of hammer-blows from sentences 
of very few words: for example, 


These are the old friends who are never seen with new faces, who are the same in 
weaith and in poverty, in glory and in obscurity. With the dead there is no rivalry. In the 
dead there is no change. Plato is never sullen. Cervantes is never petulant. Demosthenes 
never comes unseasonably. Dante never stays too long. 


Or again, 


The two sections of ambitious men who were struggling for power differed from each 
other on no important public question. Both belonged to the Established Church. Both 
professed boundless loyalty to the Queen. Both approved the war with Spain. 


It is obvious that a series formed from the lengths of such sentences is not a 
random one and that consequently differences between samples taken as we have 
taken them may greatly exceed the limits of simple sampling without, for practical 
purposes, being of any real significance. The differences between the upper 
quartiles and between the ninth deciles of the two samples from Coleridge, for 
example, are 10 or 11 times the standard errors, but cannot be regarded as very 
material. 

One point regarding the form of these distributions may be noted as of 
interest to the statistician. They are not of the Poisson type but of the type in 
which the square of the standard deviation largely exceeds the mean. The 
following are the figures for the total distributions, the unit being a word: 


M 
Bacon 48-45 1048-22 32-38 
Coleridge 40-34 677-10 26-02 
Lamb 26-25 514-14 22-68 
Macaulay 22-07 230-04 15-17 


I now pass on to an application of the method to a case of disputed authorship 
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Section II. Tue autnorsuip oF THE De Curis71: 
Tuomas A KemMPIs AND GERSON 

Although the old controversy as to the authorship of the Jmitatio still 
continues, and only last year a translation from Netherlandish texts was pub- 
lished in America (ref. 7) attributing it to Gerald Groote, the founder of the 
Brothers of the Common Life, few I believe will not hold it to have been definitely 
settled in favour of Thomas & Kempis. That certainly is my belief. Any reader 
who wants to know more of the evidence will find a brief summary in ref. 11, or 
a more detailed treatment in refs. 10, 12 and 13. If this does not suffice he can 
follow up De Backer’s bibliography, ref. 14. But I thought it would be of some 
interest to see what results the present method would yield when applied to 
investigate the respective claims of Thomas & Kempis and one of those to whom 
the authorship was formerly attributed, Jean Charlier de Gerson, Chancellor of 
the University of Paris. That Gerson could have written the book seems plainly 
impossible since, apart from all questions of style, it was clearly written by one 
who was living the monastic life; but many early editions of the book bear his 
name, and in others the /mitatio is followed by Gerson’s tractate De Meditatione 
Cordis almost as if it formed part of the same work. 

Since many works of Thomas are extant, admitted as such even by those who 
deny his authorship of the Imitatio, we can deal with two problems: (1) does the 
distribution of sentence-length in the Imitatio resemble that in (other) admitted 
works by Thomas, or no?; (2) does the distribution of sentence-lengths in the 
Imitatio resemble that in the works of Gerson? 

The edition of Thomas’s works that I used was that of Pohl (ref. 8). In this 
edition the four books of the /mitatio are (to retain the usual numbering) placed, 
as in the Brussels autograph MS., in the order I, II, [V, III. The four books are 
of very different lengths, covering in this edition some 51, 29, 47 and 120 pages 
respectively. To get a sample fairly distributed over these books, in rough 
proportion to their resnective lengths, I took ten subsamples of about 120 
sentences each as follows: Lib. I, two, from the beginning and from near the end; 
Lib. II, one, from about the middle; Lib. IV, two, from the beginning and from 
near the end; Lib. III, five, distributed through the book. The subsamples from 
books I, IT and IV together form sample A of Table E in the Appendix, and the 
five from book III, sample B. Sample B contains a rather higher proportion of 
very short sentences, but otherwise A and B are reasonably concordant. There 
was comparatively little trouble with the sentence-problem : Thomas was careful 
in punctuation, which may be taken as his own. But one point may be noted 
which occurs both in the Jmitatio, in the miscellaneous works and in Gerson: it is 
a question arising from the punctuation of quotations or sayings. The following 
from the Soliloqguiuwm -Animae will serve as an illustration: 

Caeli dixerunt. Pertransivit nos et ascendit: invaluitque supra nos. Terra respondit. 
Si caeli caclorum non capiunt: nolite me interrogare. Stellae cecinerunt: tenebrae sumus 
et non lux si illuxerit. Mare contremuit et ait. Non est in me: et abyssus ignoravit. 
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Here there is a full stop after dixerunt, respondit, ait, before the words spoken 
are given, although after cecinerunt only a colon. In all cases, it seems to me, the 
words spoken or quoted should be counted in with the preceding words as if 
there was only a colon. Further, in Lib. {II I have to confess to a piece of care- 
lessness. A number of chapters in this book begin with the vocative “Fili.” 
followed by a full stop. This should, J think, clearly be counted with the words 
following: in a translation it would be followed only by a comma. But at first 
I had entered the word as a one-word sentence, and did not realize that the point 
was important since this introduction was frequent. To have left things as they 
were would have created a misleading number of one-word sentences: to have 
revised the numbers of words in all the initial sentences of the chapters affected 
would have entailed more labour in altering tables than I was inclined to under- 
take. Finally, I simply struck out all these occurrences of initial “Fili’’, of 
which there were sixteen. Sentences in the Jmitatio being very short, my original 
distributions were booked up ungrouped, and this made the number of “1's” 
very conspicuous. 

The sample to represent the miscellaneous admitted works, of Thomas a 
Kempis was similarly made up from ten subsamples of about 120 sentences each 
taken from the following: 

(1) De tribus Tabernaculis. 

(2) Hpistula ad quendam Cellerarium. 

(3, 4) Solsloquium Animae. 

(5) Meditatio de Incarnatione Christi. 

(6) Sermones de Vita et Passione Domini. 
(7) Hortulus Rosarum. 

(8) Vallis Liliorum. 

(9, 10) Sermones ad Novicios. 

The first five form sample A and the second five sample B of Table F. Sample A 
in this instance has more very short sentences, of ten words or less, than sample B, 
but the two are otherwise very much alike, and also resemble the distributions 
of Table E for the Jmitatio. More exact comparison by the means, quartiles, etc., 
may be postponed till we make the summary comparison with the works of 
Gerson also. It is a small matter, but it may be mentioned that the “texts” of 
sermons were omitted. 

The edition of the works of Gerson that I used (ref. 9) is in four parts folio, and 
a selection for a sample had to be made from this rather appalling mass, a duty 
which coul’! have been better performed by someone less ignorant of his work 
than myself. I tried to scatter the ten subsamples of about 120 sentences well 
over the four parts, to avoid matter that seemed hardly continuous prose or 
very exceptional in style and to choose matter that, in title at least, might not 
be too remote from something that Thomas might have treated. To reject 
something as ‘‘exceptional in style’? may seem a dangerous proceeding, but I 
have in mind actually only one particular rejection, that of De Modo Vivendi 
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Omnium Fidelium. I put this down at first from its title but threw it out after 
examination. It consists of a series of brief rules, stated in curt sentences, after 
this style: 

Regula virginum. Non sint loquaces, sed simplices corde et habitu. Ad virginitatem 
matris Christi cogitent et eam diligant. Choreas vitent. Inter iuuenes non sedeant, nec se 


ab eis palpari permittant. Non ament aliquem illicito amore. Adulatores neque adulatrices 
recipiant nec audiant. Orationes libenter dicant. Sordida verba et inhonesta fugiant. 


I hope it will be agreed that this is not normal prose—there is no continuity of 
thought nor development of ideas—but an exceptional tour de force, and was 
legitimately rejected. My subsamples were taken from the following: 

(1) Sermo factus in die circumcisionis Domini coram Papa apud Tarasconem. 

(2) Tractatus contra sectum flagellantium se. (A bad choice, as it is impossible 
to imagine Thomas 4 Kempis choosing such a subject.) As this proved too brief 
to give 120 sentences, sufficient was added from Tractatus de probatione spirituum. 

(3) T'ractatus de parvulis trahendis ad Christum. 

(4) Sermo de vita clericorum. 

(5, 6, 7) De consolatione theologiae. This is modelled on Boethius, De consola- 
tione philosophiae. The three subsamples were taken from the beginning, middle 
and end. Verse was of course omitted. 

(8) De meditatione cordis: the whole. As this gave only 109 sentences, on my 
reckoning, the deficiency was made up on the next two. 

(9) Sermo de circumcisione. 

(10) Tractatus de consolatione in mortem amicorum. 

The first five form sample A of Table G, the second five sample B. It.will be 
seen that the two are almost remarkably consistent with one another. J} should 
add that I found the sentence difficulty distinctly troublesome at times with this 
edition of Gerson: full stops seem used too frequently and other punctuation 
marks inadequately. This impression was confirmed by the comparison men- 
tioned in section I. 

Finally, I decided to try an experiment with a different technique, pitching 
on columns by a random process and taking a sample of the same number of 
sentences from each. The parts or volumes I was using are numbered by columns, 
and the numbers of columns in these several volumes are as follows: 


I. 934 III. 1190 
Il. 878 IV. 982 


a total of nearly 4000 columns. Eliminating for simplicity the last 191 columns 
of Part III, any column can be specified by a number under 5000, the first digit 
giving the number of the Part, the last three digits the column; thus 2625 gives 
col. 625 of Part II, 4063 col. 63 of Part IV. Sequences of four consecutive 
numbers beginning ‘with a 1, 2, 3, or 4 were then extracted from Tippett’s 
Random Numbers and these taken as determining columns for samples. Numbers 
beyond the limits given above for Parts I, If and TV were simply dropped. But 
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numbers might also be rejected for other reasons: (1) the column might be verse; 
(2) it might contain matter not by Gerson at all, or only doubtfully by him; 
(3) the matter might be deemed otherwise unsuitable, i.e. hardly ordinary prose 
(cf. the rejection on the first sampling). I found it in fact quite impossible 
altogether to avoid the element of personal judgement and doubt now if it was 
desirable to attempt it : the point is discussed at the end of section IV. Relatively 
little was, however, rejected under the last head and the ground covered was, I 
think, more varied than before. When the column was fixed, I started with the 
first sentence beginning therein and continued straight ahead until 20 sentences 
had been counted. Samples A and B of Table H are therefore founded on 30 
such “random passages”’ each, and the total column on 60 “‘random passages’. 
If the “‘total’’ columns of Tables G and H are compared, it will be seen that they 
are closely similar. 

If now Tables E and F for the Imitatio and the admitted miscellaneous works 
of & Kempis are compared with the Tables G and H for Gerson, it will be seen 
that there are very considerable differences, especially in the numbers of long or 
moderately long sentences, e.g. of more than 50 words. In Tables E and F 
these number 15 and 22 respectively; in Tables G and H they total to 68 and 66. 
For facility of checking, frequency distributions were booked up in the sub- 
samples of about 120 sentences, and it is natural to enquire how far such small 
subsamples show consistent differences: it is obvious that no high degree of 
consistence is to be expected. The following are the numbers of sentences of 


51 words or more in the subsamples of & Kempis and Gerson respectively, ranked 
in order of magnitude: 


a Kempis: 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 4, 5, 6, 7. 
Gerson: 1, 2; 2, 3, 4, 5, 5, 5, 6,.6,.6, 7, 20, 12, 


The upper quartile for Thomas & Kempis is 2-5, and this is exceeded by 17 of the 
20 subsamples for Gerson. Seven of the subsamples for Thomas have no sen- 
tences at all of such a length: there is no subsample from Gerson without at least 
one. In both the range of variation exceeds, as one would expect, the value 
that would be given by the theory of simple sampling. On that theory the 


variance should be approximately equal to the mean, but the means and 
& Kempis: ©, 1-85; 4-33 
Gerson: J, 6-70; o*, 11-61 


Roughly, fluctuations of simple sampling account for about half the variance in 
each case. 


The complete comparison by means, quartiles, etc. is given in Table IT. 
Comparing first the constants for the miscellaneous works of Thomas & Kempis 
with those for the Jmitatio, and looking at the columns for samples A and B in 
both cases, we see that the values of the means overlap, that for sample A of the 
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TABLE II 


Constants for the distributions of sentence-length in samples from the Imitatio 
Christi, from Miscellaneous admitted works of Thomas a4 Kempis, and from 
Gerson. (Tables E, F, Gand H cf Appendix. Q, = Lower Quartile, Q, = Upper 
Quartile, D, = Ninth Decile) 


Imitatio Christi a Kempis: Misc. 
foal | A | B | Total 


| | 
A B | 
| Mean 17-0 15-4 16-2 166 | 193 17-9 | 
| Median 14-0 13-6 13-8 13-8 16-4 | 15-1 | 
10-6 9-5 10-1 9-7 10-6 | 
| Qs 20-7 18-4 19-3 20-8 23-9 | 224 | 
| @-@, 10-1 89 | 11-1 | 12-0 11-8 
| 236° | 260 | 277 29-3 | 325 31-0 
| | | 
| Gerson: Selected Gerson: Random 
q 
: A B Total | A | B | Total 
| Mean 23-5 23-4 | 234. | 23:5 | 220 22-7 
: | Median 19-4 19-9 | 196 | 19:3 18-4 18-9 
be | Q 125 | 126 | 125 | 120 11-4 11-7 
: | Qs | 320 | 304 | 313 | 309 27-9 29-5 
Q-9@, | 195 | 178 | 188 | 18-9 16-5 17-8 
| D, | 453 | 43-1 | 440 | 43:5 43-5 43-5 


Miscellanea lying between the two values for the /mztatio. The values for the 
median and for the lower quartile overlap similarly. For the upper quartiles, 
the lower value for the Miscellanea, viz. 20-8, only just exceeds the upper value 
for the /mitatio, viz. 20-7; and there is a similar but slightly greater difference 
eee. in the case of the interquartile range and the ninth decile. In no case are the 
ee differences at all large. The two tables for Gerson show a very similar degree of 
consilience. 

me But comparison of the constants for the Jmitatio and the Miscellanea of 
Thomas & Kempis with those for Gerson’s works shows quite a different state of 
affairs. For the lower quartile alone the differences are not large nor consistent, 
the lower quartile for sample B of the “random passages” from Gerson lying 
within the range of the lower quartiles for the Miscellanea of 4 Kempis and the 
Imitatio. All the remaining constants in the lower part of Table II are con- 
sistentiy larger than those in the upper part, and the differences are the more 
conspicuous the more the value of the constant is affected by long sentences: 
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it is largest (11-19 words) for the ninth decile, and next largest (4-14 words) 
for the upper quartile. 

These results are completely consonant with the view that Thomas 4 Kempis 
was, and Jean Charlier de Gerson was not, the author of the Jmitatio. 


Section IV. GRAauNtT’s OBSERVATIONS UPON THE or Morrariry 
AND THE ECONOMIC WRITINGS OF Stk WILLIAM PETTY 


The problem of the authorship of the Observations upon the Bills of Mortality 
is, in all probability, of more interest to readers of this Journal than that of 
section ITI. At the same time it cannot be treated so completely as the problem 
of that section, for we have no other and admitted works by John Graunt with 
which to make comparison: we can only compare the one work which is generally 
believed to be by him with the admitted works by Sir William Petty. 

The edition that I used both for the Observations and for Sir William Petty’s 
writings was the convenient edition of Hull (ref. 15). Graunt gave me a certain 
amount of trouble in delimiting sentences, but the trouble was far more serious 
with Petty. I should like to quote, but the editors might reasonably object to my 
quoting several sentences each two or three hundred words or so in length: 
I must therefore merely refer readers to the original for illustrations. The longest 
sentence (as I reckoned it) in the Observations is the first part of §4, Chapter vir 
(ref. 15, vol. 11, pp. 370-1). Here it seemed to me that the colon after “above- 
mentioned” on line 11 of p. 371 should be replaced by a full stop. This still 
leaves the sentence one of 213 words. On the other hand it appeared to me that 
the next following full stop between “ Annum” and “And” on line 15 ought to 
be a comma, making the resulting sentence 70 words. This seemed a fairly clear 
case. 

Take for comparison the longest sentence (again, as I reckoned it) in the 
samples from Petty, quite a characteristic loosely organized sequence of para- 
graphs in Chapter tv of the Political Arithmetic (ref. 15, vol. 1, pp. 295-6). 
I allowed this sentence to begin with the words ‘*To which purpose”’, the initial 
words in the last paragraph at the foot of p. 295, in spite of the relative adjective ; 
but all the nine paragraphs beginning with “The value”’ on p. 296 had, it seemed 
to me, to be reckoned as part of the sentence, for the last alone possesses a 
verb. The result is that the sentence, on my reckoning, only stops at the words 
‘“Kighty thousand pounds” which close the paragraph towards the foot of 
p. 296. This is, I think, a lenient and doubtful reckoning. The first paragraph 
beginning “‘To which purpose” might well be taken as merely a relative clause 
properly belonging to the preceding piu:agraph, the sentence really beginning 
with the words ‘‘Now the Wealth of every Nation”’ in that paragraph, replacing 
the colon preceding ‘“‘Now” by a full stop. This would add another 71 words to 
the 257 as I reckoned it in my work. Moreover, the paragraph following my 


i 
i 
all 
| 
| 
4] 
| 
| 
| 
| 
| 
i 


378 Sentence-Length as a Statistical Characteristic 


terminal limit on p. 296 leads off with ““Which computation”: this then might 
also be seckoned as a relative clause forming part of the same sentence, right 
down to the concluding words “Forty Five Millions’, and adding yet another 
105 words. On this computation then I ought to have reckoned the sentence as 
one of 433 words! This may sound almost incredible, but the sentence would 
really be no more than an expansion of a construction like this: 


Now, the wealth of a nation consisting chiefly in its share of the foreign trade of the 
world ve have to consider whether the English or the French have the greater per capita 
share of that trade; to which purpose I have estimated that the total value of the exports 
from Great Britain and Ireland, America, Africa, the East Indies, etc. amounts to some 
ten million pounds, a computation sufficiently justified by the Customs returns with an 
allowance for smuggling etc. 


There is a special type of difficulty that occurs repeatedly, and may be 
illustrated by §11, Chapter vi of the T'reatise of Taxes (ref. 15, vol. 1, p. 56). The 
paragraph starts “The Inconveniences of the way of Customs, are, viz.’’, and 
there then follow four numbered paragraphs with different grammatical rela- 
tions to the introductory clause, like this, to abbreviate greatly : 

(1) That duties are laid upon [raw materials etc. ]. 

(2) The great number of officers requisite. 

(3) The great facility of smuggling by bribery, etc. 
(4) The customs and duties amount to so little that some other way of levy 
must be practised together with it. 

No. 1 obviously forms part of the sentence with the introductory clause. 
Nos. 2 and 3 are not sentences as they stand, and ought to have been counted in 
also I think, but no. 4 is an independent sentence. Actually I find that in this 
case I do not seem to have obeyed my own rule that a word-sequence, to form a 
sentence, must be a grammatically complete expression of a thought, and 
nos. 2 and 3 were reckoned separately: this was, I believe done in some similar 
cases also. Indeed judging from the few instances where I have looked again at 
my classification some time after the original work was done, I seem to have 
been usually too merciful rather thar too severe in placing the limits of the 
sentence. Difficulties were far more fr.yuent and more troublesome than with 
any author I had tackled, and made the work both tedious and unsatisfactory, 
for far too much was thrown on my personal judgement. Hull says (ref. 15, 
pp. lxvii-lxviii): 

Unfortunately the use of rash calculations grew upon Petty, and as was to be expected, 
he gives widely varying estimates of the same things. It must be added that he is frequently 


inaccurate in his use of authorities and careless in his calculations and upon at least one 
occasion he is open to suspicion of sophisticating his figures. 


This is sufficiently severe but I would add that, in my opinion, Petty’s literary 
style, more especially in his argumentative writing, is loose and slovenly, indeed 
at times hardly grammatical. It is difficult to dissociate such slovenliness in 
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writing from slovenliness of thought. Only in purely descriptive matter does his 
style take on quite a different complexion. 


They have a great Opinion of Holy-Wells, Rocks, and Caves, which have been the 
reputed Cells and Receptacles of men reputed Saints. They do not much fear Death, if 
it be upon a Tree, unto which, or the Gallows, they will go upon their Knees toward it, from 
the place they can first see it. They confess nothing at their Executions, though never 
so guilty. In brief, there is much Superstition among them, but formerly much more than 
is now; for as much as by the Conversation of Protestents, they become asham’d of their 
ridiculous Practices, which are not de fide. As for the Richer and better-educated sort 
of them, they are such Catholicks as are in other places. (Political Anatomy of Ireland, 
Chap. x11: ref. 15, vol. 1, pp. 199-200.) 

That is both pithy and picturesque. 

So much for the difficulties; and now let us turn to the data. Graunt’s 
Observations form but a slim volume, and his sentences tend to be long: omitting 
all prefatory matter and the appendix, and also one or two passages with tabular 
matter that it seemed impossible to deal with in any other way, I obtained no 
more than 335 sentences in all. The distribution is shown in Table J of the 
Appendix. To give some notion of the consistence of the style throughout, I have 
also broken up the total into three approximately equal subsamples. These are 
so small, and the run of the figures inevitably so irregular, that no very close 
consilience can be expected ; but the degree of consistence does not seem to be at 
all unsatisfactory, and is particularly close as regards the numbers of longish 
sentences. 

For facility of comparison, I thought it would be convenient to make the 
samples from Petty of the same size, and so intended: but, owing to a small 
revision made later in the Graunt table on looking through the work again, the 
totals for Petty are 334 against the 335 for Graunt. Sample A was taken mainly 
from the Political Arithmetic, as the work most closely associated with his name 
by statisticians. But this gave me only 300 sentences, and 34 were added from 
the Treatise of Taxes to make up the desired total. Sample B was taken wholly 
from the Treatise of Taxes. The distributions are given in Table K of the 
Appendix, and it will be seen that they are on the whole very concordant, 
with the exception that A shows a larger proportion of sentences of excessive 
length. If comparison be made with Table J it is obvious that these samples 
from Petty contain a very much larger proportion of long sentences than the 
Observations. There are only 17 sentences of 101 words or more in Table J, 54 and 
45 sentences of 101 words or more in samples A and B of Table K. It may be 
added that this difference shows itself even in small subsamples. In the sub- 
samples A, B and C of Table J there are 7, 6 and 4 such sentences. In corre- 
sponding subsamples of 111 or 112 sentences for samples A and B of Table K 
there are 24, 19, 11, 11, 15 and 19. 

When I had got so far, I thought it would be of interest to supplement 
samples A and B for Petty’s writings by a sample of ‘random passages” taken 
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in the same sort of way as for Gerson in section III. Hull’s edition, though in 
two volumes, is paged continuously and runs only to 621 pages apart from 
appendices, index, etc.: omitting prefatory matter, the text of the first item (the 
Treatise of Taxes) does not start till p. 18. Pp. 314-438 are occupied by Graunt, 
with blank pages, title pages etc. I accordingly determined ‘‘random pages”’ by 
extracting from Tippett’s Random Numbers triplets of digits beginning with 0, 
1, 2, ..., 6, but not exceeding 621, and omitting numbers between the limits 
000-018 and 314-438. A considerable number of the pages so given had to be 
struck out as either being blank pages, or containing prefatory matter, titles, 
contents, etc., or’ something obviously unsuitable such as tabular or semi- 
tabular matter. Very few were struck out as otherwise unsuitable, the only 
condition imposed being that the text should be fairly continuous ordinary prose, 
even though prose containing a good many figures: the limits were left as wide 
as possible. On each of 33 pages accepted I counted ten sentences, starting with 
the first complete sentence on the page and continuing till ten had been counted. 
On a supplementary 34th page I counted only four such sentences, so as to make 
up 334 sentences in all. We are dealing here with a much smaller range of 
numbers than in the Gerson experiment, and repetitions may occur: in fact, of 
the 55 numbers of three digits which were retained as lying within my limits and 
of which 22 were subsequer.tly struck out as impossible or unsuitable, two 
occurred twice (one being amongst the subsequent rejections) and cne three 
times. Two or three pairs might have been expected: the one occurrence of a 
triplet was unlikeiy. 

The data given by this experiment are shown in column C of Table K of the 
Appendix. It will be seen that the first part of this distribution differs quite 
appreciably from the corresponding portions of columns A and B, there being 
a larger number of short sentences. But the ‘tail’ of long sentences does not 
differ greatly, there being 40 sentences of 101 words or more in column C 
against 54 in column A and 45 in column B. The main source of the divergence is 
mentioned below, and the value of the sample discussed. 

Table III gives the brief summary comparison in terms of means, quartiles 
etc. Taking first the medians and lower quartiles, all the three medians for 
Petty are higher than the median for the total of the Observations, which is the 
comparable figure based on the same number of sentences, but the median for 
sample C of Petty is lower than the median for sample A (based on only 111 
sentences) of Graunt. A precisely similar statement is true for the lower 
quartiles. All the other constants, means, upper quartiles, interquartile ranges 
and ninth deciles are consistently higher for Petty than for Graunt, and the 
differences, especially for upper quartiles and ninth deciles, quite considerable. 
The distributions for the two authors seem to me completely differentiated: or, 
to put it otherwise, the results confirm other evidence that the actual authorship 
of the Observations is not the same as that of the economic writings of Sir William 
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TABLE III 


Constants for the distributions of sentence-length in Graunt’s Observations 
and in samples from Petty’s Works. (Tables J and K of Appendix) 


Graunt | Petty 
Constant 
A B Cc Total A B Cc 
Mean 50-1 45-5 46-9 47-5 66-1 60-2 56-3 
Median 45-2 33-0 37-4 40-1 56-9 51:3 44-0 
Qa 31-2 23-8 26-3 26-8 36-1 34:7 29-0 
Qs 63-3 55-5 65-5 62-3 83-2 79-0 73-7 
Q3-Q 32-1 31-7 39-2 35-5 47-1 44-3 44-7 
» 85-2 85-0 85-2 85-2 | 126-0 | 109-3 | 110-1 


Petty. Lord Lansdowne remarked, in replying to Prof. Greenwood (ref. 18, 
sentence quoted in ref. 19); “‘For literary style, neither the Observations nor 
Petty’s writings are conspicuous, but I have yet to learn what differences can be 
detected between them in this respect.’’ Sentence-length is surely one cha- 
racteristic of literary style, and the difference seems clear. In the wider sense of 
style, the sense in which le style c’est Vhomme méme, the Observations seem to me 
to differ wholly from Petty’s writings: they suggest a man of quite a different 
type of mind and quite a different character. The evidence from sentence- 
length is interesting, but adds very little. 

To return in conclusion for a moment to the method of “random passages” 
in relation to this method of investigation, let ms deal first with the reason for 
the divergence of sample C for Petty’s writings from the two samples A and B. 
The latter were taken wholly from the Political Arithmetic and the Treatise of 
Taxes. Examining my 33 samples of ten sentences each for sample C, I found 
that eight (including the triplet and the pair) which were remarkable for the 
proportion of short sentences all came from the Political Anatomy of Ireland. The 
distribution for these 80 sentences alone is totally different from that of sample A 
or sample B, the constants being as follows: mean, 34-8; median, 31-2; Q,, 24-7; 
Q;, 42:2; Q;—Q,, 17-5; Dg, indeterminate within the blank range 59-5—62-5, say 
61. Why this difference? I have already mentioned the reason and illustrated it 
by a quotation from this very tract. The matter is purely descriptive, descriptive 
(in the samples concerned) of the religion, diet, clothes, language and manners of 
the people of Ireland, and of the Government, militia and defence of the country ; 
and when Petty has only to describe and not to argue he can apparently write 
like a Christian.* The Observations being, I think one may say, mainly argu- 
mentative, this sample of “random passages” is not properly comparable with 

* Webster and the O.£.D. concur in classifying this expression as “Colloq. or Slang”. But 


after all the early Christians, judging from both gospels and epistles, did write in short sentences. 
Biometrika xxx 
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it: it does not deal “with the same sort of material in the same sort of way” to 
quote the phrase from the beginning of section II. Ludicrously enough there 
really is no tract of Petty’s in which he does deal with the same sort of material 
in the same sort of way as Graunt, so the condition is strictly impossible of 
fulfilment: we did our best in taking samples from two tracts that were both 
argumentative, and these two samples were very fairly consistent with each 
other. 

But this result raises the whole question of method: was I right in attempting 
something like random sampling at all? The notion that samples ought to be 
random is so firmly engrained in one’s mind that it seems almost sacrilegious to 
object to the application of the rule in a particular case. But after all the problem 
surely is not whether a tract passing under the name of Jones does or does not 
resemble, in this particular characteristic, a random sample from the writings of 
Brown, but samples from Brown’s writings dealing, so far as possible “with the 
same sort of material in the same sort of way’. The method of “selected samples ”’ 
is, from this standpoint, entirely justified and perfectly correct. A critic may, of 
course, object to the particular choice of selected samples (the particular choice 
in this section and the last for example): but the method is right, and preferable 
to the method of ‘random passages”’ as I used it—that is to say with as little 
restriction as possible in regard to matter and treatment. 

But there is this to be said. In the first place, used as I used it, the method 
does serve in some degree as a control and perhaps a warning. It brings out very 
well the apparent (comparative) homogeneity of Gerson’s style in respect of 
sentence-length, and the heterogeneity of Petty’s. In combination with selected 
samples it better exhibits all the facts. In the second place it might be used 
differently, just as much care being taken in deciding whether to accept or reject 
a passage given by the random numbers as in the case of the ‘selected samples’’, 
but thereby obtaining a wider range of selection. 

Further, there is a danger in random sampling to which possibly I have not 
paid sufficient attention, the risk of bias in sampling arising from the varying 
lengths of sentences and the fact that the series of sentence-lengths, in order as 
they occur, is not a random one. To take a simple but extreme example, suppose 
our book consisted of equal numbers of pages containing respectively 30 sentences 
of 15 words each, and 15 sentences of 30 words each. Actually then the book 
would contain two sentences of 15 words to one of 30 words. But if we pro- 
ceeded by the method used for obtaining ‘‘random passages” from Petty, 
taking only a sample of 10 sentences from each page determined by Tippett’s 
numbers, we would tend to get a sample containing equal numbers of sentences 
of the two iengths: the number of long sentences would be overweighted. The 
difficulty would be surmounted if we made the sample, not a fixed number of 
sentences, but a fixed length of matter, say one page: or, provided the pages in 
the book were arranged fairly at random, by making the sample long enough to 
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cover a number of pages, like my subsamples of about 120 sentences. In fact of 
course no real case is as simple or extreme as this, and actually it will be re- 
membered that the “random passages” sample from Petty (sample C) gave 
fewer long sentences and more short sentences than samples A and B, though this 
is no proof that it was not in some degree biased in the direction indicated. 
Some possible processes of sampling might easily lead to extreme bias of this 
type. Suppose, for example, we decided to make a random sample of single 
sentences, determining the page and the number of a word on the page by 
random numbers, and taking the sentence in which this word happened to fall. 
Then, it seems to me, the chance of a sentence being “caught”’ for the sample 
would be directly proportional to its length; for a sentence of 10 words would 
have ten chances of being caught and a sentence of 40 words forty chances. (The 
difficulty is closely analogous to that of determining size of family by asking 
casual people as to the number of their brothers and sisters.) The risk is much 
lessened, in my opinion, by taking longish samples and, of course, if we are 
mainly concerned with comparisons and not absolute figures, is less important, 
for the bias is unlikely to be very different in the two authors compared by the 
same method. The whole question of the best method to use for random sampling 
is, however, worth further discussion. So far as my own experience goes, 


however, I am inclined to prefer the method first used, the method of selected 
passages of considerable length. 
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APPENDIX OF TABLES 


These tables are all in the same form, showing the numbers of sentences 
having the length (in words) stated in the left-hand column, in a sample or 
samples from the source stated in the heading and more fully in the preceding 
text. Thus, in a sample taken from the first portion of Bacon’s Essays, column A 
shows that there was only one sentence (out of 462) of a length between 1 and 5 
words, 8 with a length between 6 and 10 words, 24 with a length between 11 and 
15 words, and so on. Blank lines have been omitted in the tails of the tables 


to save space. 
TABLE A 
Bacon’s Essays (1597-1625) 
A, first half to end of XX VI. B, second half to end of LI 


Sentences Sentences 
No. of No. of 
words | words 
Total A | Total 

1 2 3 121-125 3 4 7 

6- 10 8 8 16 126-130 2 3 5 
Ll- 15 24 25 49 131-135 2 l 3 
16- 20 22 23 45 136-140 1 2 3 
21- 25 46 53 99 141-145 3 2 5 
26- 30 43 42 85 146-150 — 1 1 
31- 35 57 55 112 151-155 1 2 3 
36- 40 38 37 75 
41- 45 24 38 62 166-170 
46— 50 31 25 56 
55 23 28 51 186-190 1 | 
56— 60 25 21 46 191-195 
61- 65 19 17 36 196-200 I — 1 | 
66- 70 12 13 25 
71- 75 19 27 211-215 l 
76- 80 7 ll 18 
81- 85 2 ll 23 226-230 

86- 90 6 7 13 231-235 

96-100 2 ll 13 311-315 os 1 1 
101-105 7 3 10 
106-110 es 12 
116-120 2 | 4 6 Total 462 | 474 | 936 


| | 
_| 


TABLE B 
Coleridge, Biographia Literaria (1817) 
A, vol. I to p. 134. B, vol. 1, pp. 1-66 and 104~-end (p. 182) 


Sentences Sentences 
No. of 
| words 

A B Total A B Total 
9 2 ll 101-105 4 6 10 
21 37 58 106-110 2 2 4 
46 44 90 111-115 1 1 2 
46 49 95 116-120 5 1 6 
58 73 131 121-125 2 3 5 
64 56 120 126-130 1 i 2 
55 57 112 131-135 1 1 2 
51 52 103 136-140 — 
49 52 101 141-145 — 2 3 
39 37 76 146-150 1 2 3 
24 29 53 151-155 —: 1 1 
22 23 45 156-160 —- 1 1 
21 18 39 161-165 1 — 1 
20 17 37 166-170 — — ins 
20 9 29 171-175 -— l 1 
6 9 15 196-200 1 1 
7 7 14 

5 3 8 Total 601 606 | 1207 

| 
TABLE C 


Charles Lamb, Elia (1823) and Last Essays of Elia (1833) 


A, Elia: from beginning to middle of Mrs Battle’s Opinions on Whist. B, Last Essays: 


Detached Thoughts on Books to Barbara S— inclusive 


No. of 
words 


1- 5 

6-10 
11-15 
16-20 
21-25 
26-30 
31-35 
36-40 
41-45 
46-50 
51-55 
56-60 
61-65 
66-70 
71-75 
76-80 


Sentences 
A | B 

29 | 30 
115 100 
lll 100 
61 85 
62 56 
36 46 
36 46 
21 29 
16 19 
19 16 
13 18 
5 6 
15 ll 
2 5 

7 8 

3 8 


Sentences | 
No. of 
words 
Total A | B | Total 
59 8l1— 85 7 6 13 
215 86— 90 3 — 3 
211 91— 95 5 2 7 
146 96-100 2 1 3 
118 101-105 3 1 4 
82 106-110 1 1 
82 111-115 1 1 2 
50 116-120 1 — 1 
35 121-125 1 1 2 
35 126-130 1 2 3 
31 131-135 1 
ll 136-140 2 1 3 
26 — — — — 
7 171-175 — 1 1 
15 
ll 
Total 579 599 1178 


words 
6- 10 | 
ll- 15 | 
16-30 
| 26- 30 
36— 40 | 
85 

91— 95 
96-100 

r 
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TABLE D 
Macaulay 


A, from first portion of essay on Lord Bacon (1837). B, from first 
portion of essay on The Earl of Chatham (1844) 


Sentences Sentences 
No. of No. of 
words words | 
A B Total B Total | 
Sere 
| | 
l- 5 26 20 46 71l- 75 4 4 
6-10 100 104 204 76-80 | 4 4 8 
11-15 126 126 252 81- 85 | 2 2 
16-20 89 lll 200 86-— 90 2 - 2 
21-25 82 104 186 91— 95 as 1 1 
26-30 51 57 108 96-100 | 1 1 2 
31-35 26 35 61 101-105 | 1 1 
36-40 29 39 68 106-110 
41-45 16 22 38 111-115 | 1 1 
46-50 10 14 24 116-120 | - 
51-55 12 8 20 121-125 | | | 1 
56-60 3 12 | 
66-70 2 = oe Total | 601 | 650 | 1251 | 
TABLE E 
Imitatio Christi 
A, from Lib. I, Il and IV. B, from Lib. IIT 
Sentences | Sentences 
No. of No. of ssa 
words words 
A B Total | Total 
1-5 8 31 39 51- 55 6 1 7 
6-10 142 160 302 56— 60 1 1 2 
11-15 201 175 376 61— 65 1 1 2 
16-20 108 129 237 66-— 70 1 1 2 
21-25 72 47 119 75 
26-30 33 19 52 76— 80 — 1 1 
31-35 23 19 42 
36-40 ll 9 20 106-110 1 1 
41-45 3 5 8 
46-50 6 5 ll ao 
Total 617 604 1221 


| 
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| 
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TABLE F 
Miscellaneous admitted works of Thomas a Kempis 
For details as to the sources of samples A and B see text 


| 
Sentences Sentences 
No. of No. of 
words words | 
A B Total A | Bs! Total 
| | 
1-5 33 14 47 51-55 8 
6-10 153 98 251 56-60 +3 3 
11-15 165 168 333 61-65 2 | — 2 
16-20 100 117 217 66-70 — 2 2 
21-25 65 72 137 71-75 1 1 2 
26-30 40 57 97 76-80 — 1 1 
31-35 22 35 57 81-85 1 — 1 
36-40 6 14 | 20 86-90 _ 1 1 
41-45 10 91-95 1 1 2 
46-50 5 7 
Total 608 604 1212 
: TABLE G 
Gerson, Opera. Selected samples 
For details see text 
| 
Sentences Sentences 
No. of a No. of 
words words 
A B Total A B Total 
1- 5 30 29 59 61— 65 7 4 ll 
6-10 85 81 166 66— 70 3 5 8 
11-15 108 115 223 71- 75 2 — 2 
16-20 101 90 191 76— 80 2 2 4 
21-25 68 78 146 8l- 85 — ] 1 
26-30 46 66 112 86— 90 — 2 2 
31-35 53 45 98 91-— 95 ] 2 3 
36-40 28 32 60 
41-45 28 25 53 111-115 1 — 1 
46-50 22 19 41 — 
51-55 14 8 22 131-135 — 1 1 
56-60 7 6 _ & 
| Total | 606 | 611 | 1217 


L 
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TABLE H 


Gerson, Opera. Random passages 
For details see text 


| Sentences | Sentences 
No. of No. of | 
words | words 
A | B | Total | A B | Total 
| 23 34 61- 65 6 
6-10 99 97 | 196 66-— 70 4 6 10 
11-15 97 lll | 208 71- 75 3 2 5 
16-20 105 98 | 203 76— 80 2 2 4 
21-25 75 80 155 81- 85 1 
26-30 48 53 101 86-— 90 1 -— | 1 
31-35 43 26 69 91- 95 1 1 2 
36-40 32 33 65 96-100 1 _ 1 
41-45 25 16 | 41 — — 
46-50 19 20 | 39 121-125 l a 1 
51-55 6 9 3 15 126-130 1 — | 1 
56-60 a 6 13 ¢ | 
Total 600 600 1200 
TABLE J 
Graunt’s Observations upon the Bills of Mortality 
A, B, C, first, second and third portions: the whole included 
apart from some omissions (see text) 
Sentences Sentences 
No. of No. of 
words words 
A B C Total A B C /Total 
5 86-— 90 2 4 2 8 
6-10 3 2 , 12 91-— 95 2 — 1 3 
11-15 2 9 2 13 96-100 ~ 1 4 5 
16-20 5 9 9 23 101-105 1 -- -— 1 
21-25 8 12 9 29 106-110 1 1 1 3 
26-30 8 11 6 25 111-115 — — — — 
31-35 12 8 20 40 116-120 | 2 oan 3 
36-40 10 10 8 28 121-125 1 1 1 3 
41-45 8 8 Ss 24 126-130 2 — —- 2 
46-50 8 6 1 15 131-135 
51-55 9 9 5 23 136-140 1 _ _ 1 
56-60 8 3 4 15 — 
61-65 4 3 5 12 151-155 = -— 1 1 
66-70 5 4 6 15 156-160 _ 1 1 2 
71-75 5 2 5 12 — —}—|]-|j— 
76-80 3 3 2 8 211-215 — 1 = 1 
81-85 2 2 4 8 
Total 111 | 112 | 112 | 335 
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TABLE K 
Petty 


A, Political Arithmetic, 300 sentences, with 34 added from the Treatise of Taxes. 
B, Treatise of Taxes. C, random passages (see text) 


Sentences Sentences 
No. of No. of 
words words 
A B C A B C 
5 1 — 131-135 5 1 
6— 10 4 3 6 136-140 3 2 2 
ll- 15 3 8 13 141-145 2 4 3 
16— 20 ll 21 17 146-150 4 3 4 
21-— 25 16 17 26 151-155 2 1 2 
26-— 30 20 20 31 156-160 1 — 1 
31— 35 26 16 30 161-165 1 3 — 
36- 40 22 31 27 166-170 1 1 - 
4l— 45 18 28 24 171-175 1 1 — 
46— 50 28 19 18 176-180 — — 
51l-— 55 12 18 ll 181-185 1 ~ 
56— 60 21 15 14 186-190 —- 1 - 
65 23 14 16 191-195 
66— 70 16 16 ll 196-200 l 
71-— 75 10 13 10 201-205 — - 
76— 80 14 15 8 206-210 
8l-— 85 10 7 12 211-215 2 1 2 
86— 90 14 10 ll 216-220 — 
91-— 95 6 9 5 221-225 1 — 1 
96-100 5 8 4 226-230 — 
101-105 3 4 2 231-235 1 — 1 
106-110 5 10 5 236-240 — —_— —_— 
111-115 3 2 1 241-245 1 — — 
116-120 4 5 — 
121-125 5 3 5 256-260 1 ~~ — 
126-130 6 — 3 
Total 334 334 334 
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THE ESTIMATION OF THE LOCATION AND SCALE 
PARAMETERS OF A CONTINUOUS POPULATION OF 
ANY GIVEN FORM 


By E. J. G. PITMAN, University of Tasmania 


1. INTRODUCTORY 

In this paper we shall be concerned only with continuous chance variables which 
have “elementary” probability functions, i.e. if X is any chance variable con- 
sidered, we shall assume that there exists a non-negative function f(x), defined 
and continuous at almost every real value of x, such that the probability that X 
lies in any interval is equal to the integral of f(x) over that interval, We shall call 
f(x) simply the probability function of X. 

The essential problem of estimation may be stated as follows. We have a 
sample consisting of n independently observed values of X, 


Vy, Vg, Vy. 
The probability function of X, f(x,6,, 4g, ...), is of known form but involves 
certain parameters @,, 4,, ... whose values are not known, and we wish to estimate 
these values from an examination of the sample. 

The sample may be specified by a point (the sample point) whose Cartesian 
co-ordinates are (%,, ...,%,,) in an n-dimensional space W (the sample space). For 
the co-ordinates of a variable point in this space we shall use 

SQ> Sne 
We shall write F= [I {f(&,.9, 9%, ..-)}; 
r=1 
and call F the probability of the sample £,, ..., ¢,,. Throughout the paper we shall 
denote by H a function of the € such that 


E(H) = | 


exists, H denoting expectation, or mean value. The points of W where F is not 
zero form a region which we shall denote by W,; it will in general depend on the 
particular values of the @. A line which contains internal points of W, will be said 
to belong to W,. 

This paper develops a general method of solving problems of estimation in 
which the unknown parameters are “location” or “scale” parameters. Wesuppose 
that the probability function of X is 


and that the function f(x) is known but that one or both of the parameters a, c, 
which determine respectively the location and the scale of the distribution of X, 
is unknown. This general problem has been considered by Fisher (1934, p. 303), 


‘ 
4 
: 
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and the method of this paper is very closely related to Fisher’s; but there is a 
difference in the approach to the problem, and perhaps also in the final point of 
view. Also, a number of questions not discussed by Fisher are dealt with here in 
detail. The approach to the problem is essentially on the lines of Neyman & 
Pearson (1936) and Neyman (} 37), and I have purposely adopted a good deal of 
the notation and terminology of these writers.* 

A probability function f(z) which is such that xf(2) remains bounded as x 
tends to oo or to — 00 or to 0, will be said to possess the property x,. It is obvious 
that if this is the case, and if 0<m<n-—1, 


| tf (x, —t) ...f (at, —t) dt 


is convergent for all sets of vaiues of the x when f (2) is bounded, and for almost all 
sets of values when f(x) is unbounded. In the latter case the values of x,, ..., x, 
could be so chosen that several of the functions f (2, —t), f(x,—#), ... would become 
infinite for some finite value of ¢, and this might prevent the convergence. By 
the substitution v = 1/t we can show that, if f(x) has the property «,, 


is convergent for all sets of values of the x when f(x) is bounded, and for aimost all 


sets of values when f(z) is unbounded. If in addition f(x) is bounded in the 
neighbourhood of 0, and 0<m<n-—1, 


is convergent for all values of the x when f(x) is bounded, and almost all values of 
the « when f(x) is unbounded (but still bounded in the neighbourhood of 0). If 
f(x) is a monotonic function of x when | x| is sufficiently large, and is also either 
bounded in the neighbourhood of 0 or monotonic on each side of 0, it will possess 
the property «,, for, as is well known, from the convergence of 


dx and 


it follows that 2f(x) tends to 0 as x tends to oo or —oo or 0. Thus all ordinary 
probability functions have this property. 


* I do not agree with the statement (Neyman, 1938, p. 158) that the theory of confidence 
intervals and the theory of fiducial probability are two different things, and I hope that this paper 
may help to show that they are essentially the same and that their two points of view are both 
necessary for a full comprehension of the theory of estimation. 

The relation between direct and inverse methods in statistics has been discussed by Jeffreys 
(1937). With the proper @ priori probability distribution of a parameter, the results of the two 
methods are formally similar; but, of course, essentially different problems are being dealt with by 
the two “methods”. However, the properties of “estimators”, with which the present paper is 
largely concerned, are true whichever problem is being discussed. 

[For some comment on what appears to be a real difference between Neyman’s theory of 
confidence intervals and the approach of the present paper, see a Note in the Miscellanea section 
below. Ep.] 
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If xlog | x | f(x) remains bounded as x tends tooo or — 0 or 0, we shall say that 
f(x) possesses the property x,. By means of the substitution ¢ = —logu, we can 


f (x, e-*) ... f(x, dt, 


where 0 < m <n — 1, isconvergent for all sets of values of the x when f(x) is bounded, 
and for almost all sets of values when f(x) is not bounded. 


2. THE ESTIMATION OF @ 
Here we take the probability function of X as 
f(x-a) 
and a is to be estimated. In accordance with the notation of § 1 we write 
F = 
Make the change of co-ordinates, 
= 
(r = 2, 3, n). 
The Jacobian of the transformation is 1, so that over any part of W 


| FHdE,...dé, = | FHds, ...d2,. 


The locus, Za, 23, «++» Z,, all constant, 
is a straight line parallel to the line 


Any such line for which | " Fdz,>0 


will be denoted by L. The family of lines Z will be the same for all values of a. 


A point (2,, ...,”,) which is on some L will be called an observable point.* We 
shall write 


and call #,(H) the mean value of H on L. Since 


it is evident that if H,(H) = h(constant) for every L, 
E(H) = Fdz,...de, =h; 


while if E,(H)>h, E(H)>h. 


* If there is no interval throughout which f(x) is zero, all points will be observable; if f(x) 
vanishes outside a certain finite interval some points will not be observable. Each L is the locus of 
the sample point corresponding to a given “configuration” (Fisher, 1934, p. 301). 


FHdz, 

“3 = E,{f), 2 

| Fdz, 

| 
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If I’ is a set of intervals on L we write 


PE 


Suppose that J’ is determined on every L; denote by w’ the region formed by all 
the I’ and by Pf{w’} the probability that the sample point will fall in w’. If 
P{I'| L} has the same value « on every L, then P{w’} = a, while if P{I’ | L} > 2 (con- 
stant), ), Phe’ } >. This can be proved in the same way, or it can ‘a aud from 
the previous result by defining H to have the value 1 at a point of an J’, and the 
value 0 at any other point.* 

If (x,, ...,%,) is any fixed point on L, the co-ordinates (&,, ..., €,,) of any point 
on L may be expressed in the form 


£.-a=2,—t (r=l, 2, ..., ). 


We then have Fdz, 


and = Hf (x,—t)... f(x, —t) dt. 


Let J denote a set of non-overlapping intervals in (— 00, 00). The points of the 
line LZ corresponding to values of ¢ lying in J will form a set of intervals I’ on L. 
We shall call J ‘‘ proper”’ if its end-points, 

A,, Ag, 
are functions of 2,,...,%,,, not involving a, such that J’ is independent of the 
particular fixed point (x,, ...,2,) on L. The necessary and sufficient condition for 
this is obviously 
A,(a, +A, ...,%,+A) = A,(X,,...,%,) +A (r=1, 2, ...). 

It should be noted that +0 are suitable end-points. Throughout the discussion it 
is to be understood that J is proper. J depends on ,, ..., x, but not on a; we express 
this by = ..:;%). 


On the other hand, /’, which is independent of the particular point (2, ...,2,,), 
does depend on a. Change of a from a, to a, will increase all the £ co-ordinates of 
each of the end-points of J’ by a,—4a,, so that J’ will simply slide along Z through 
a distance (a,—a,),/n in the positive direction (£, increasing) of L. 


Fdz, 


* It is assumed that /’ varies with Z in such a way that H is almost everywhere continuous. 
This is ensured in the applications. 


Let P{I' | L} = (1) 


F dz, | 
By 
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In some of the applications of the theory, « is given and we have to determine J, 
in others, J is given and a is a function of it. If a were any given number between 
0 and 1, we might determine I by (1) together with the requirement that the sum 
of the lengths of the intervals of J is to be a minimum. In general, J will then be 
uniquely determined; if so, it will be proper. 

The points of Z corresponding to values of ¢ lying in J, i.e. the points of J’, will 
be called points of acceptance. Since for the point (x,,...,x,,) itself, =a, the 
necessary and sufficient condition for this point to be a point of acceptance is that 
a lies in I(x,, ...,%,,), Which we shall write 

ae I(z,, ..., Z,). 
If points of acceptance are determined on every line L, they will form a region of 
acceptance w’(a).* The remainder of the sample space will be called the critical 
region w(a).} If « has the same value on every L, the probability, P{w’(a)}, that 
the sample point will fall in the region of acceptance is a, while if for every L, 
«> # (constant), the probability is greater than f. It will still be independent of 
the particular value of a. 

The effect of a change in the value of a, say from a, to a, will be simply to 
move the region of acceptance, without change of form, through a distance 
(a,—a@,),/n in the positive direction of the lines L. From this it easily follows that 
if J is chosen so that the sum of its lengths is a minimum for the corresponding a, 
and therefore the sum of the lengths of J’ is also a minimum, then when a=4a,, the 
probability that the sample point falls in w’(a,) is greater than the probability 
that it falls in w’(a,). Using the notation and terminology of Neyman & 
Pearson (1936, p. 8), we have, with this choice of J, 


P{E ew'(a,) | a} > P{Kew'(ag) | 


and therefore P{E ew(a,) | a} < P{Eew(ag) | 
Since P{E € w(aq) | ag} = P{E ew(a,) | ay}, 
this gives P{E €w(aq) | a9} < | 


and so the critical region w(a) is ““unbiased’’. If the shortest J is not always 
unique, we shall have to replace the sign < in the last statement by <. 


The relation between a and I(x,,...,2%,,) is 
| f(ay—t) (at, —t)dt = f(ty—t) .-f(t_—t) (2) 
It is icin in practice to replacet the mee tin (2) by the symbol a, and we 
where —a)...f(x,,—a)da. 
* Cf. Neyman (1937, p. 351). + Cf. Neyman & Pearson (1936, p. 5). 


t This replacement could not have been made earlier without confusion; at this stage ¢ is a 
mere dummy. 
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The definition of an observable point (x,, ...,2,,) now takes the form 
f(x,—4@) ... f(x, —a)da>0. 


We have seen that the necessary and sufficient condition for the point 
(x, ...,%,) to be a point of acceptance is 
ae ...,%,)- 


When « is constant, the probability that the sample point (x,, ...,2,,) is a point of 
acceptance is a. Hence 
I(z,, ...,%,)} = 
If a > # (constant), we shall have 
P{ae ...,%,)} > 
We may sum up our results in the following theorem. If J(x,, . 


.-, is proper, and 
defined for every observable point (x,, ...,%,,), and if 


... f(v,, —a)da = a (constant), 
I 


where —a)da = 1, 

then P{ac I(2,, ...,%,)} = &, 

while if ... —@)da> (constant), 
I 


P{ae I(x, ...,%,)} > BP. 
We shall express all this by saying that the fiducial function* for the estimation 
of a is kf(x,—a@) ...f(x,—a). 
We shall denote this function by g(a). 
The statement ae I(x,, ...,%,) 


sep 


is a variable statement which is a function of ,, ...,#,. When particular, actually 


observed values of x,, ..., %,, are inserted in it, we obtain a definite statement about 
the unknown parameter a that is either true or false, and we shall not know which 
it is; but we do know that the probability that the variable statement (4), when 
used in this way, will give a true particular statement about a is « (supposed 
constant). As R. A. Fisher expresses it, the fiducial probability of the variable 
statement (4) is «. If we decide upon a, say 0-95, and then define J accordingly, 
we shall have a rule for automatically making a definite statement about the 
unknown parameter a whenever a set of values of the chance variable X is 


observed. A statistician using this rule can expect to be right about 95 times out 
of 100. 


* T at first called it the fiducial probability function, but finally decided to shorten the name by 
dropping the word “probability”. As will be seen later, problems of estimation can be dealt with 
completely, and very simply, by means of the fiducial function. 
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Suppose that f(z) = 22, 


Let 2x,,...,%, be a sample from the (triangular) population with probability 
function f(x—a). The distribution extends only from a to a+ 1. We shall denote 
the smallest and the largest of the x by xg and 2, respectively. Since f(x) vanishes 
outside the range (0, 1), the fiducial function for the estimation of a is 


2"k(x, —@) ... (x, —@), 
if and 2, <a+1,ie. if x,—-1<a< 42g, and is zero for all other values of a. 


Thus 
S 


Since the fiducial function vanishes outside the interval (x, — 1, xg) and is mono- 
tonic decreasing in this interval, the shortest J will consist of a single interval with 
its lower end at x,;—1. Thus the shortest J will be the interval (x, —1, h), where 


(v,—a)...(%,—a)da =a (x, —a@) ... (%,, —a) da, 


that is G(h) — G(x, —1) = af G(x, 1)}, (5) 
where G(a) = (x,—a) ... —a) da, 
0 


a polynomial of the (n + 1)th degree in a. Thus (5) is an equation of the (n + 1)th 
degree to determine A. It will have a single root in the range (x,—1, xg). With 
this value of h, the statement 

has fiducial probability «. 

If g(a), the fiducial function for the estimation of a, is for all values of the x a 
unimodal function of a, i.e. if it is a strictly monotonic function of a in 6; S<a<b, 
and zero outside (b,,6,), or if it is strictly increasing in 6, <a<bpg, strictly de- 
creasing in b, <a < bg, and zero outside (b,, b,), the shortest J will always be unique 
and will consist of a single interval. Any point of acceptance on £ will have a 
greater probability (or likelihood) than any point on L which is outside J’, and, 
whatever the value of a, J will include the maximum likelihood estimate of a. 
This is the value of a which makes 


a maximum. We shall denote it by 
Ay 


A sufficient condition for g(a) to be unimodal for all values of the x is that f(x) 
satisfy either of the following conditions: 


(i) f(x) strictly monotonic over a certain range of x and zero outside that 
range; 


Biometrika xxx 
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(ii) log f(x) a concave function of x over a certain range of x and f(x) zero 
outside that range. 


This is easily proved by using the relation 
log g(a) = logk + log f(x,- a), 

and remembering that the sum of any number of strictly monotonic functions of 
the same type (increasing or decreasing) is strictly monotonic, the sum of any 
number of concave functions is concave, and that a concave function is unimodal. 
The normal, the gamma, the beta (except when U-shaped), the triangular, and the 
trapezoidal (except rectangular) distributions all have probability functions 
which satisfy (i) or (ii). 

We have so far been discussing estimation by interval.* This is what is 
required in statistical tests; but in practice it is often necessary to decide on some 
definite number as our estimate of a. Any such estimate will be the value of some 


function of the sample values A(t}, 


which does not involve the unknown parameter a. If we have no source of know- 
ledge of the value of a except the observed sample, any principle of estimation 
which would assign the value a, to a when the observed values of X were 


would assign the value a,+A to a when the observed values were 
..., 
The function A must therefore satisfy the relation 
A(a, +A, ..., +A) = Alay, ...,U,) +A. 

Any function which satisfies this relation will be called an estimator of a. We note 
that the end-points of a proper J must all be estimators, including in this category 
+0, which formally have the estimator property, +00+A = +0. For a par- 
ticular population, an estimator A will be a chance variable with a definite 
distribution. It is easy to see that for a population of given form the distribution 
of the chance variable A —a is independent of the particular value of the popula- 
tion parameter a. The practical requirement is an estimator A whose distribution 
is such that it is not likely to differ very much from the true value of a. 

For points on the line LZ through (a, ...,%,,), 

Hence on any line L the difference between two estimators is constant, 
The fiducial function 
g(a) = kf(x,—a) ... f(%,—a) 


is defined, non-negative, and integrable in —0o0<a<oo when f(x) is bounded. 


* Cf. Neyman (1937, p. 346). 
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When f(x) is unbounded, the statement is true for almost all sets of values of the zx. 
Hence, apart from any questions of probability, it may be looked on as the 
elementary frequency function of a continuous distribution. This distribution we 
shall call the fiducial distribution of a determined by 2,,...,z,,. If f(x) has the 
property «x, of §1, 


a" g(a) da = a"! f(z, —a) ... f(x, —a)da 


exists for all, or almost all, values of the x, and the moments of the fiducial distri- 
bution, up to the (n—1)th at least, exist. If d(a) is any function of a, we shall 


write 


The mean value of (A —a)” on Lis 


{A(ay, 


where it is to be understood that in the last line A means A(z, ...,2,,). Similarly 
H7{| A—a|™} = E,{| A—a|™}. 

The mean, median, or any such point of the fiducial distribution is a function 
of the x which has the estimator property. This is so because an increase of each 
of the numbers «,, ..., 2, by the same number A simply shifts the fiducial distribu- 
tion, without change of form, through a distance A in the positive direction. 


We may take the median (assumed unique*) of the fiducial distribution as our 
estimator of a. We shall denote it by 


Ac 
Since atayaa = 3, 


the probability that —-0o<asAg 
is }. Thus the median value of A, is a. If A is any other} estimator, 
E,{| A-a|— |Ao—a|} = A—a|}— 
= E,{| A—a|}—E,{| Ac—al} 
=0ifA=A,onL 
>0Oif A + L, 


* This will be so for all values of the z if, and only if, the distribution of X has no gaps. When 
the median estimator Ag is not unique, the theorems will still hold provided A is not a median 
estimator. 


+ Estimators which are identical for almost all values of the x are regarded as not different. 


26-2 
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since the mean absolute deviation of the fiducial distribution is a minimum about 
its median A,. Hence E{| A—a|—| Ag—a}}>0, 


that is E{| A—a|}> E{| Ag—a}}. 


Thus Ag is the estimator with the smallest mean absolute error. It has another 
important and interesting property which entitles it to be called the “closest”’* 
estimator of a. It is likely to be nearer to the true value of a than any other 
estimator; more precisely, the probability that 


|Ac—a| =|A-a| 
is greater than Define J as (— 0, 0) if A(x,,...,x,,) is equal to A(x, ...,2%,), 


and as the interval extending from A) to 00 or — 00 which includes A, if A 
and A, are not equal at (x,, ...,2,). In either case 


| g(a) da >}. 

I 

Hence PiaeI}>}; 
but ael 
implies | Apg—a|<|A-a|. 


Another important estimator is A,, defined by 


A E,(a) = | ag(a) da. 
Its mean value is a since 
E,(Ay-—4) = Ej(Ay—4) = Ay — #,(a) = 0, 
and therefore E(Ay,—4@) = 0. 


By the method used to establish (6) we can show that it is the estimator with the 
smallest mean square error 


(7) 


The expression on the left-hand side of (7) is the variance of A,,; but the right- 
hand expression is not the variance of A unless H(A)=a. However, we can prove 
that not only is (7) true, but also the variance of A ,, is less than the variance of A 
unless A,,—A is constant. If H(A)=a+h, replace A in (7) by A—A, and the 
result follows. If the chance variable X has a finite standard deviation, o, the 
variance of the sample mean, = (Zz,)/n, 


is o?/n. Since % has the estimator property, this implies 
E{(A —4)*} <o?/n, 
unless A ,,—% is constant. 


* Cf. Pitman (1937). At the time of writing that paper I had not thought of using the word 
‘estimator’ to make a clear distinction between the function of the sample vatues and its value in 
a particular observation, which is what we take as our “estimate” of a. 
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If we define A, by 


| | a minimum, 


Aq will be the estimator with the smallest mean rth power absolute error 
Aqy—a|"} < E{| 

The maximum likelihood estimator A,, mentioned above, is defined by g(A,;)a 
maximum, and its value is the abscissa of the mode of the fiducial distribution. 
The mode of its distribution is a, and it is always included in the shortest J. 
Except in simple cases like the normal and exponential populations, its approxi- 
mate numerical value will usually be easier to determine than that of any of the 


other estimators discussed above. Apart from these it seems to have no special 
advantages. 


For the normal population 


(x—a)? 
f(x-a) = ] 
and g(a) exp | - = =| , £= 


where o is supposed to be known. The fiducial distribution of a is normal with 
mean % and standard deviation a/,/n. In this case the estimators discussed above 
all coincide, Ac = Ay = Ay = Ay ==. 

In this case % is the “‘best”’ estimator of a, the “best’’* estimator being 
defined as follows. An estimator A, is the best estimator of a if, for all positive 
values of h, P{| A,—a| Sh} = P{| A—-a| < 


and, for some positive values of h, 

P{| Az—a| Sh} > P{| A-a| < h}. 
If, for all values of the a, the fiducial distribution of a is symmetrical and also 
unimodal in the wider sense, i.e. if g(a) is a non-decreasing function of a at values 


of a below the centre of symmetry and consequently non-increasing above the 
centre, A, is the best estimator, and 


The last part of this statement is obvioas. The first part follows from the fact 
that Acth A+h 
Ac—h A-h 


for all positive values of h, and, when A(x, ...,x,,) is not equal to A(x, ...,%,), 


Acth A+h 
| g(a) da > | g (a) da 


Ac—h 


* It has been objected that the use of “best” to denote a particular kind of estimator is some- 
what provocative; but I submit that an estimator which possesses the property of the definition is 
undeniably the best. 
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for some positive values of h. The condition that the fiducial distribution be 
symmetrical and unimodal in the wider sense for all values of the x is obviously 
also necessary for the existence of a best estimator. 

The fiducial distribution of a determined by a sample from the rectangular 
population which extends froma — }toa+ }isarectangular distribution extending 
from x; —4 to x,+4, where xy and x, denote respectively the smallest and the 
largest member of the sample. 4(xg+2,) is the best estimator. 

For the exponential population, 


f(x—a) 

==) “<a, 
we have g(a) = 
=0 


where xg is the smallest member of the sample. Here 
A; = Ay = %g—1/n, Ag = (log 2)/n. 
For the triangular population discussed earlier in this section, A, is the value of 
h corresponding to «=}, 
G,(%g) — —1) 
— G(%,—1) 


where 


G(a) = a)... G,(a) = @,—a)ada. 


3. THE ESTIMATION OF ¢ 
Here the probability function of X is 
¢>0, 
and F =c*f(£,/c) ...f(€,/c). 
If X takes only positive values, we can reduce this to the previous case by con- 
sidering the distribution of log X and putting 
logce=y. 
The probability function for the distribution of log X is then 
fle"), 
and y plays the part of a in the previous discussion. The results obtained apply to 
all cases; but we must establish them by a method which applies to chance 
variables taking both positive and negative values. As the analysis is similar to 
that in § 2, it will be given only in outline. 
A function C(x, ...,x,) whose value may be used as an estimate of c, i.e. ac 
estimator, must evidently satisfy 
(i) Clay, ...,%,) 20; 
(ii) C(Ax,,...,Ax,) =AC(x,,...,%,), AZO; 
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so that C must be a positive homogeneous function of the first degree in the x. 
Any function of this type will be called a c estimator. Its logarithm, G, which will 
be a y estimator, will satisfy 


G(Ax,, ...,Ax,) = G(x,,...,%,)+logaA, 


and any function of this type will be called a y estimator. Note that 0 and 0 
formally have the C property, while +o have the G property. 
A half line or ray with one end at the origin will be denoted by R if it belongs to 


W,. Any point which lies on some R is called observable. We define the mean 
value of H on R by 2 
dr 


E,(H) = 
Fr°— dr 
0 
where r= 
the distance of the point (&,, ..., ¢,,) from the origin. If #,,(H) has the same value h 


onevery R, E(H)=h, and if £,(H)>h (constant), H(H)>h. This is easily proved 
by changing to spherical polar co-ordinates 

x,=rcos6,, 

sin cos A, 


and remembering that the Jacobian of the transformation is r”-! multiplied by a 
function of the @. 


For a set of intervals J’ determined on R, we define P{I' | R} by 
PU’ | R} | Fr" dr = Frdr, 
0 Ji 


and we have, as before, P{w’}=« if P{I’'| R}=a (constant) for every R, and 
P{w'}> B if P{I’ | R}> B (constant), where w’ is the region generated by the I’, 
and P{w’} is the probability that the sample point falls in w’. 
If (x,, ...,%,,) isa fixed point on R, the co-ordinates (&), ..., €,,) of any point on 
R may be expressed in the form 
(r=1, 2%, 
For points on R F f(e-S,) ...f(€ &,); 


(etx,) 
also E,(H) 


[oem ... dt 
0 


If J is a set of intervals in (— 00, 00), the points of R corresponding to values of 
t lying in J will be called points of acceptance and will form a set of intervals I’. 
I will be proper if J’ is independent of the particular point (x,, ...,x,,) on R, the 
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necessary and sufficient condition for which is that the end-points of I be y 
estimators (including possibly +00). The relation between J and « = P{I’ | R}, is 


| f(a ... dt = al” ef (x,e-) ...f(a, dt. (8) 
I 


The points of acceptance on all the rays R form a region of acceptance w’(y), 
and the remainder of the sample space is the critical region w(y). The regions of 
acceptance corresponding to different values of y will be similar and similarly 
situated, with the origin as centre of similarity. It can be shown that the critical 
region obtained by using on every R the shortest J for the corresponding « is 
unbiased. An observable point is one for which the integral on the right-hand side 
of (8) is not zero. 


Finally we obtain this theorem. If J(2,, ...,2,,) is proper and defined at every 
observable point, and if 


f(w,e-7) ... f(x, e-7) dy = & (constant), 
I 


where f(z, ... dy = 1, 

then P{ye I (a, ...,%,)} = 

while if k | e-"’f(x,e-7) ... f(x, e-”) dy > (constant), 

then P{ye I(x, ...,%,)} 

Again we express all this by saying that the fiducial function for the estimation of 
ily) = f (xe) ... f (tne), 


and the continuous distribution with elementary frequency function g,(y) is 
called the fiducial distribution of y determined by 2, ..., x,,. 


If ye I(z,, 


is equivalent to CES (24, 


the end-points of the set of intervals J will be c estimators (including possibly 0 to 
oo), and 


f(x,e-7) ... dy = c—"-1 f(x,/c) ... de. 
I J 


J will be said to be proper for the estimation of c. The shortest J is determined by 


I 


a minimum for the corre: _ nding a; hence the corresponding J makes 
dc 
ze 
a minimum. The fiducial function for the estimation of c is 
go(c) = f(x,/c) c20, 
and the last theorem can be stated with J,c,g,(c) in place of I, y, g,(y) respectively. 
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The expression for the mean value of H on R is 
For points on the ray R through (z,, ...,x,,), 
G(é,, = GE, e77, ...,€,€-7) = ..., 


= G(z,, ...,%,)—t, 


where G is any y estimator. Hence for any function ¢ 


= P{G(x,, ...,x,) dt 


o(G—y) dy 


(9) 
where it is to be understood that in the last two lines G means G(z,, 
particular, E,{(G—y)™} = E,{(@—y)"} 
and Epi|G—y|"} = @—y |". 

The factor & in the expression for g,(y) is evidently a homogeneous function of 
degree n in the x. Writing g,(y) in the form g,(y, x,, ..., 2,) to indicate its 
dependence on the x, we have 

Hence multiplying each of the numbers 2, ...,2,, by the same number A will 
simply shift the fiducial distribution of y, without change of form, through a 


distance log A in the positive direction; therefore the mean, median, etc. of the 
fiducial distribution of y all have the G property. 


Go, the median of the fiducial distribution, is the closest estimator of y, and 


the estimator with the smallest mean absolute error. The median value of its 
distribution is y. 


Gy = = dy = | lose) gale) de 
will be the y estimator with the smallest mean square error, 
< 
Its mean value is y. G,, the maximum likelihood estimator, is defined as the 


value of y which makes g,(y) a maximum, and we can define G,, as the estimator 
with the smallest mean rth power absolute error. 


The mean, median, etc. of the fiducial distribution of c are c estimators; but 
the relations of the c estimators to one another are not as simple as those of the 
y estimators. The median, Co, is the closest estimator of c. Its median value is c, 


4 
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and its logarithm is Gj; but it is not in general the c estimator with the smallest 
mean absolute error. Again, the mean value of the c estimator with the smallest 
mean square error is not c. These complications arise from the fact that the 
relation corresponding to (9) is 
ERP(C/c)} = Ef 
which is obtained from (9) by replacing ¢(G—-y) by ¢(e%), = ¢(C/c). Hence 


c™ Cc 0 cm 


For the estimator with the smallest mean square error, we must have E,{(C —c)?} 
a minimum, and therefore 
| 92¢) (C—c)*de 
0 


is a minimum; hence (C—c)dce = 0, 
that is CE,(1/c?) — E,(1/c) = 0. 
Thus Cy, the c estimator with the smallest mean square error, is defined by 
_ 
Eley 
(C—c) 
Since E,(C—c) = = e{CE,(1/c)— 1}, 
E,{\/c)}? — E,(1/c?) 
E Cy) = of <0, 
r(C@—¢) E,(1/e) 
and therefore <9. 
A sufficient condition for E(C™) = c™ 
1 
is = ——— 
E,(1/e™) 


Before leaving the general theory we note that if f(z) has the property x, of § 1 
and is bounded in the neighbourhood of 0, the first n — 1 moments of the fiducial 
distribution of c are finite for all values of the x when f(x) is bounded and for almost 
all values when f(x) is unbounded, and that if it has the property «,, the first 
n—1 moments of the fiducial distribution of y are finite for all, or almost all, 
values of the x. 

If X is normally distributed about 0 with standard deviation c, its probability 

function is 1 

k 


and go(c) = D0, 


—}z2/c2 


ay 
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where S=2z2. If h is any positive homogeneous function of degree 0 in the x, 
C =,(4S/h) is ac estimator. Hence if we determine h so that 


c 
we shall have P{c=C} =a, 
that is (11) 
By the substitution }S8/c? =u, (10) becomes 
—Ug,n/2—1 


so that A is constant. Looking at it the other way, we see that if 2 is any given 
positive number and a is determined by (12), then (11) is true. In other words, for 
a fixed normal population of mean 0 and standard deviation c, the chance 
variable }8/c? has a I'(4n) distribution, as is well known. 


1 
I Tan) < u du = a, (13) 
then Pt{h, Sh,} = @, 
that is < 4S/h,} = a, 
which gives P{} log } log (}S/hy)} = 


Thus fiducial ranges for c? and y can be determined for any given value of «. 
For the shortest range of y, which gives an unbiased critical region, we must have 


slog a minimum, 


and therefore logh,—logh, aminimum. ...... (14) 
From (13) = 0, 
and from (14) == Q; 
hy hy 


The critical region corresponding to values of h,, h, determined by (13) and (15) is 
unbiased.* 


The estimators discussed above are all simply expressible in terms of S. 


Gy = E,y) = log ($8/u)} = {log — E, (log u)}, 


"1 
E u) = naa | log udu = 


(4n) 0 
| 
therefore Gar 2 S log T(4n) 


When x is large, this is approximately 
}(log S —log 2—log $n) = } log (S/n). 
* Cf. Neyman & Pearson (1936, p. 19), where }v,, }v, take the place of h,, hg. 
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Denote the median of the I'(m) distribution by h(m); it is approximately equal 
to m—4. The fiducial median value of u, =4S/c?, is h(4n); hence 


Co = V{aS/h(3n)} 
and Ge = }log {8/h(4n)}. 
The closest estimator of c?is C?, = 4S8/h(4n), 
which is approximately S/(n—§). 

The c? estimator usually employed is 
S/(n—1); 
its mean value is c?. 

The simplicity of this case arises from the fact that the fiducial distribution 
(of ¢ or y) depends on 2, ..., x,, only through the value of S. When S is fixed, 
the fiducial distribution is the same no matter what the individual values of the x 
may be. The important estimators and fiducial ranges are all functions of S only. 
S is what is called a sufficient statistic* for the estimation of c or y. Other cases 


which are equally simple because of the existence of a sufficient statistic are the 
generalized gamma distribution, 


(2/0) = 


cI’'(m) 


x =0, 


and the rectangular distribution, 
ctf(zjc)=c, OSzSe, 
=O, or 
The fiducial functions for the estimation of c are respectively 


> 
and 92(C) = cnt? 
=0, c<2,. 


The sufficient statistics are ¥ and 2,. 

While the existence of a sufficient statistic simplifies the mathematics and 
enables us to obtain explicit expressions for the important estimators and for the 
fiducial ranges, the methods of this and the preceding section are in no way 
dependent on this existence. When the sample values have been observed, the 
fiducial distribution is determinate, and it is theoretically possible to obtain the 
values of Ag, A y,, etc. or of Go, Gy, Co, ete., as the case may be, or the values of 
the end-points of the fiducial ranges J or J, to any required degree of accuracy. 
With small samples the labour would not be great. A practical process to deal 
with large samples would depend on a simple approximation to the fiducial 
distribution; but it is not proposed to discuss that aspect of the problem here. 


* See Neyman & Pearson (1936, p. 117) and Pitman (1936). 
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4. THE ESTIMATION OF @ AND ¢ 
The probability function of X is assumed to be 


1 .(z—a 
ft, 
the function f(x) being known but the parameters a and c both unknown, and c 


positive. Thus 
1 E,—@ 


In practical problems of estimation, the chance variable X will be the measure 
of some physical quantity, and a and c will be the measures of quantities of the 
same kind as X. Hence any function of the observed values 2,, ...,x,, whose value 
may be used as an estimate of a, i.e. any a estimator, A, must be homogeneous of 
the first degree in the x.* Also, it must still satisfy the relation of § 2, 


A(2,+4+-A, ..., +A) = ...,2%,) +A. 


Combining these two, we have 


Any function of this type will be called an a estimator and will be denoted by A. 
The probability function of the chance variable X + k, where k is a constant, will 
differ from the probability function of X, only in the value of a; hence any c 
estimator, C, in addition to being positive homogeneous of the first degree in the 
x, must also be invariant with respect to change of origin, and therefore 


Any such function will be called a c estimator. 


The change of co-ordinates required is a combination of those used in §§ 2 
and 3; 


Si» 
= £,+rcos6,, (r>0) 


= £,+rsin§, cos 44, 


* This restriction was not made in § 2 for the following reason. From consideration of dimen- 


sions it is evident that the probability function of X in §2 must really be of the form ‘f r \. 


where c is the measure of some quantity of the same kind as the quantities whose measures are 
X and a; but since c was supposed to be known it was absorbed in the functional symbol f by 
writing the probability function in the form f(z—a). All that can be said about the dimensions of 
an a estimator is that it must be homogeneous of the first degree in c, 2,, ..., Z,, and this does not 
restrict its degree in the x only. 
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The relation of Es—Ex, En — 
to 


is the relation of rectangular Cartesians to spherical polars in n— 1 dimensions. 
The Jacobian of the transformation is 


Es, En) 


where @ is a function of the @ only. 


= 


The locus Og, ..., On», all constant, 


is a (two-dimensional) half-plane with the line 


(16) 


as its edge. That the locus consists of a half-plane only can be seen as follows. 
Any (two-dimensional) plane through the line (16) consists of two half-plenes 
which join along this line. These half-planes are distinguished from one another 


by the si f 


These signs do not change over one half-plane; but they all change as the point 
(&,, ---, &,) moves from one half-plane to the other. When the @ are all fixed, the 


signs of t,—£, &-& 


are all fixed because r is positive, therefore the locus consists of a half-plane only. 
We denote any such half-plane by Q, and define the mean value of H on Q by 


FHr" di, dr 
d&,dr 
Q 
It is then easy to show that H(H)=h if Eg(H)=h (constant) on every Q, and 


E(H)>h if Eg(H)>h on every Q. 
If D’ is any region in &Y, we define P{D’ | Q} by 


| Frv-2 dé, dr 
P{D' | Q} = 


| Fr-* dé, dr 
Q 


Obviously P{w'}=a if P{D’| Q}=«a (constant) on every Q, and P{w’}>/f if 
P{D’ | Q} > 2 (constant) on every Q, where w’ is the region in W formed by all the 
D’, and P{w’} is the probability that the sample point falls in w’. Since @, is 
constant on Q, and 


= 


1. 
| 
ot 
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we may write (17) and (18) in the more convenient forms* 


(E,—£,)"-2 FH dE, dé, 


E(H) = 

dE dE, 

and P{D’ | Q}= 

dE 


The co-ordinates (£,,...,£,) of any point on the half-plaie Qt through the 


oint (x,, ..., “,) may be expressed in the form 
Cc v 


Since v is equal to c(#,—2%,)/(&,—§, 


), it will always be positive. Note that, at 
points on Q, 


v v 
..., (19) 
and similarly »En) _ (20) 
Since “O(u, 2 ) 


dudv : 


Write g(u, v) = 


| v | 


where & is defined by | | g(u, v)dudv = 1, 
0 


then EQ(H) -{ Hg(u, v) dudv. 
0 


We may specify a pair of values of uw, v by a point in a plane—the parameter 
plane y—whose Cartesian co-ordinates are (w, v). If D is a region in y, the points 


* Tf @ happens to lie in the hyper-plane ¢,=£,, we must replace é, by &,, where ¢,—¢, is not 
zero at all points of Q. 


+ Q is the locus of the sample point corresponding to a given “configuration” (Fisher, 1934, 
p. 304). 
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of Q corresponding to points (u,v) lying in D will be called points of acceptance, 
and will form a domain D’, it being understood that D is proper, i.e. that D’ is 
independent of the particular point (2,, ...,x,) on Q. This will be so if the boundary 
curves of D have equations of the form 


XU 
soe = 0. 


In particular, the straight lines 
A(x, 


and = Ofz,, ...,; 
are suitable boundary curves, as may be seen by writing their equations in the 
: 


The necessary and sufficient condition for the point (%,,...,z,) to be a point of 
acceptance is (a,c) € D(2,, 


and the relation between D and « = P{D’ | Q} is 
} g(u, v)dudv = a. 
D 


Replacing the symbols u, v in this equation by a, c, we may state that the fiducial 
distribution of a and c is determined by the fiducial function 

1 
yer | ent 
This means simply that if D(#,,...,z,) is proper, and defined at every point 
(a,,..., and if 


| g(a,c)dadec = a (constant), 
D 


then P{(a,c)€ D(x, ...,%,)} = @, 

while if | g(a,c)dadc>f (constant), 
D 

then P{(a,c)€ D(x, ...,%,)} > 


The mean value theorems are obtained by using (19) and (20). 


(= *) g(a, )dade, 


which we denote by (21) 


it being understood that in the last two lines A means A(#,, ...,,). Similarly 


EQ g(C/e)} = P(C/c)}. 
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Make the region D in the (a,c) plane consist of a strip or set of strips parallel 
to the c-axis and extending from c=0 to c=, with boundary lines, 
a= A(z, ...,%,). 
The intersection of D with the a-axis is a set of intervals J whose end-points are 
a estimators. The expression for «= P{D’ | Q} is now 


[°a(ac)deda =f g(a)da, 
IJ0 I 


where g,(a) g(a, c) de. 
0 

The statement (a,c)eD 

becomes ael, 


Hence the fiducial function for the estimation of a is 


c 


The mean, median, etc. of this fiducial distribution of a are a estimators; but, 
owing to the denominator c in (21), the relations of these estimators to one 
another are not in general as simple as the relations of the estimators in §2. Ag, 
the median of the fiducial distribution of a, is obviously the closest estimator of a, 
and its median value is a; but it is not in general the estimator with the smallest 
mean absolute error. 

For the estimator with smallest mean square error, we must have 

co = 2 
a minimum, which requires 


oJ 
that is AE,(1/c*) — E, (a/c?) = 0. 
Thus the required estimator is = E,(/e) 


Its mean value is not necessarily a. 
In the same way we can show that the fiducial function for the estimation of 


cis = 


cnt 


Since the mean value theorem for c estimators, 


9(C/e)} = = | * (Cle) g(a,c) da de 
= $(Cle) gale) de, 


is the same as in § 3, the relations of the c estimators to one another will be the same 
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here as there. The properties of the y (= log c) estimators will be simpler than those 
of the c estimators. Gj,, Go, Gi, Co have the same properties as in § 3, e.g. 


Gy = E,(loge) = 


has mean value y, and is the y estimator with the smallest mean square error. 
Ifa statement is to be made about both a andc with a given fiducial probability, 
we cannot simply combine the separate statements about a and c; we must use 
the fiducial function g(a, c). This is what is required in statistical tests involving 
both a and c. When D is defined at every point, the points of acceptance form a 
region of acceptance w’(a,c), and the remainder of the sample space is the critical 
region w(a,c). Suppose now that a is fixed, and that D(x,,...,x,) is defined at 


every point by | g(u, v)dudv=a 


| dude -{ dud(logv) a minimum; 
DY D 


it can easily be shown that D so defined is proper. We take a random sample of n 
values of X and then make the statement that 


(a,c) D(x, ..., 2p). 


We know that the probability of making a true statement in this way is a, no 
matter what the actual values of a and c may be. Suppose further that the 
purpose of cur observations is to test the hypothesis that a and c have certain 
specified values, a=a,, c=c,. If (a,,c,) does not lie in D as thus determined by 
the sample values, the statement (22) contradicts the hypothesis and we therefore 
reject the hypothesis. If (a,,c,) does lie in D, the hypothesis is not contradicted 
by (22) and we accept it. The probability of rejecting a hypothesis when it is 
actually true will be 1— a. In terms of the sample space,* the hypothesis a=a,, 
c =C¢, is accepted if the sample point falls in the region of acceptance w’(a,,c,), and 
rejected if the point falls in the critical region w(a,,c,). If Dis defined as above, it 
can be shown that the probability that the sample point falls in w’(a,,c,) is a 
maximum when a@=a@,, c=c,, and therefore the hypothesis a=a,, c=c, is more 
likely to be accepted when it is true than when it is false. The critical region 
determined in this way is unbiased. Further discussion of critical regions and of 
statistical tests associated with them must be reserved for another paper. 
Applying the theory to a normal population with probability function, 


c/(27) 
we have (a,c) = 
enti 


* Using the ideas of Neyman & Pearson (1936) and Neyman (1937). 
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where S = X(x,—%)?, = Hence 

k’ 

= g(a,c)de = {S/n+ (a 

The fiducial distribution of a is symmetrical, with its mode at Z. A, = %, and we 
can show that A,,=¥, but there is no need to do this, for we have already done it 
in §2. The proof given there that Z is the best estimator still holds good, for in § 2 
we were comparing % with a wider class of estimators which included all the 
estimators of this section. 


If h,, hg are fixed numbers, h, < hg, | mdz 
1+22)i 
P{E +h, V(S/n) = < < hs} 
+22)in 


which is “Student’s” result. For a given value a of the last expression, the 
fiducial range of a will be shortest when h, — h, is least, i.e. when h, = —hg. Thus if 


dz 
+22)in’ 
then P{E—h, (S/n) Sa = a, 


and this is the shortest fiducial range for given «. 
For the estimation of ¢ we have 


92(C) = | g(a, c)da = 


This is the same as for the normal population in §3 except that S has a 
different meaning and n is replaced by n—1. Thus the estimation of c from a 
sample of x from a normal population of unknown mean is essentially the same 
as the estimation of c from a sample of n — 1 from a population of known mean. 

Suppose that l 
f(x) = £20, m>0, 


=O 
and that we have a sample of » from the generalized gamma population with 
probability function 1 


The expression for g(a, c) is 


k n ( 
=0, 
where 2g is the smallest sample value. Integrating from 0 to 00 with respect to c, 
we obtain 
= 


= 0, 
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In the particular case of the exponential population, m=1, the probability 
function is 


and 0, 
k 
Hence g(a,c) = 
= 0, a>Xg. 
k’ (n—1)(%-a 
Further = = ase 
) (%—a)" (%—a)” ="S> 
=0, 
the median A; is given by 
= 3. 
* 


Thus the closest estimator of a is* 
Ac = (% — 2g). 
The estimator with the smallest mean square error is 
Aw = E,{ajc*) 
> 
E,(1/c?) 


which is easily shown to be 
%—(1+1/n)(%—2z). 
The fiducial distribution of a is unimodal with its maximum at the upper end- 


point xy. Hence the shortest fiducial range has its upper end at this point. Putting 
z=(xg—a)/(¥— xg), we obtain 


For the exponential population, the fiducial function for the estimation of c is 
Zs k’e-Tle 
= g(a,c)da = 


where 7' = n(% — xg), and the estimators and fiducial ranges are easily determined. 

The location and scaling of the rectangular population with centre a and 
range c is simple and interesting; but there is no space here for further discussion 
of illustrative examples; it may just be remarked that }(a,+.2,) is the best 
estimator of a. 


* Cf. Pitman (1937, p. 220). 
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5. THE ESTIMATION OF THE DIFFERENCE BETWEEN THE LOCATION 
PARAMETERS OF TWO POPULATIONS OF THE SAME FORM 


Suppose that the probability functions of the chance variables X and Y are 
respectivel 
y a— “|, 


and that we wish to estimate b, the other parameters being also unknown. A pair 
of samples of values of X and Y, 


Yi> Yo ---> Un 


may be specified by a point in (m+ 2)-dimensional space. For the Cartesian 
co-ordinates of a variable point in this space we shall use 


A bestimator is any function which is homogeneous of the first degree in the x and 
y and which satisfies the relations 

B(x, +A, HAs Yas ---2 Yq) = Yas --+> Va) — A; 


The transformation of §4 is applied separately to the £ and co-ordinates, 
with a slight modification for the latter; 


f= 
Eo = +17 Ne = 9, +178 C08 
= £,+rsin@, cos Ns = +788iN COS Po, 


The Jacobian is r+" s"— 20,D,, where @, is a function of the @ and ®, a function 
of the ¢ only. 


The locus 8, Pr» all constant, 
is a three-dimensional half-space Q, bounded by the two-dimensional plane 
= £2 =... = = Ne =--- = 


The definitions of Eg(H) and P{D' | Q} are* 


E,(H) =~2——_ —— 


P{D' | Q} = 


* As before, if Q@ happens to lie in the hyper-plane ¢, = £,, we must replace £ by £,, where £,—¢, 
is not zero at all points of Q. 
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where D’ is a domain in Q, and it can easily be proved that Z(H) and P{D’ | Q} 
have the same properties as before. 

The co-ordinates (&),...,n> 1 +++» %,) Of a point in the half-space through 
(21, ...,Lms Yr, ---) ¥,) May be expressed in terms of three variables #, w, v, as follows: 
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_%,—t 


Proceeding as before, we finally obtain as the fiducial function for the estima- 
tion of b, 


n)= 6, e)deda, 


k ™ (x,-a) * .(y,—a—b 
where g(a, b,c) = “| : 


and k is defined by ine i g(a, 6, c)dadbde = 1. 
0 


If X and Y are normal variables with the same standard deviation c, and with 
means a and a+b respectively, 


k 
g(a, b,c) © 


where T = 8, 
™ n 
Hence | g(a, b,c)de = k, Tr 
0 


ky 
+ 


Integrating this from —0o to 00 with respect to a, we obtain 
(1/m+ 1/n) + 
Thus the fiducial distribution of b is of the same form as the fiducial distribution of 


a determined by a sample of m +n — 1 from a single normal population of unknown 
mean and unknown standard deviation. We have finally 


mn n 


= 


7 Bit, 3(m+n—2)} 
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Consider now two exponential populations with probability functions 
g(a, b, c) as at b < Ys; 
= 0 for all other sets of values of a, b. 
where xg is the smallest x and y, the smallest y. 
Write g(a, b} = | g(a,b, c) de; 
0 
ky 
= < 
then g(a, b) + a+b<yz, 
= 0 for all other sets of values of a, b. 
The conditions a < xg, a+b < yg, are equivalent to 
asx, when 
and asyg—b when b2yg—2zg. 
Put B = yg—xg, C = m(¥—xg) +n(¥— yg); then when b < B, 
2s k,da 
g,(b) g(a, b) da = {m(z a) + n(y b)}mtn 
— ky 
= {C ra n(B 
ky 
Similarly, whenb>B,  g,(b) = Br 
If h is positive 
PB kg 1 
db = n(m 2) (Cc + nh)mtn—2 
B+h ke l 1 
we [, db = en (C+ 
Since | g,(b) db = 1, 
t Cim+n—-2 
this gives k, = 


m+n 
Hence, if h, and h, are positive, 


= n 
~ (m+n) (1+ (m+n) (1+ 
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For a given value « of this integral, the range (B—h,, B+hg) will be shortest when 


g,(B—h,) = 9,(B+h,), that is when 


C+nh, = C+ mhg. 
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Putting nh, = mh, = pC, 

Bih, 1 
we have a= gx(b) db = 
Hence P{B—pC/nsb< B+ pC/m} = a, 
where 


P= (1 — 1. 


Upon this result we can base a test for exponential populations analogous to 
Fisher’s extension of “‘Student’s” test for normal populations. A similar test for 
rectangular populations can be obtained in the same way. 


6. CONCLUDING REMARKS 


More complicated problems of estimatioi. of location and scale parameters, 
for example those which arise when we have samples from more than two popula- 
tions, can be dealt with by the methods of this paper. Questions about statistical 
tests of hypotheses concerning such parameters can be treated in the same way. 
Here it has been impossible to do more than just glance at this side of the subject; 
but it is hoped to continue the discussion in a later paper. 


SUMMARY 


The main problem considered is the location and scaling of the distribution of a 
continuous chance variable X. We suppose that the probability function of X is 


1 
e>0, 


where the function f(z) is known but one or both of the parameters a, c, which 
determine respectively the location and scale of the distribution, is unknown. 
We have a sample of n independently observed values of X, and from these we 
have to estimate the unknown parameter or parameters. Any function of the 
sample values whose value may be used as an estimate of an unknown parameter 
is called an estimator of that parameter. The paper shows how to determine an 
estimator with any required property, such as minimum mean absolute error, 
or minimum mean square error. In particular, the closest estimator is determined; 
this is an estimator whose median value is the true value of the parameter and 
which is likely to be closer to the true value than any other estimator. It is shown 
that in certain cases a best estimator exists. 

Fiducial limits for the unknown parameter are determined, and what is called 
the fiducial distribution of the parameter is defined. It is shown that problems 
of estimation can be dealt with very simply, and completely, by means of fiducial 
distributions. For a population of any given form, the fiducial distribution of a, 
when both @ and ¢ are unknown, provides us with a test which corresponds to 
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“Student’s” test for significance of the mean of a sample from a normal 
population. 

The estimation of the difference between the location parameters of two 
populations of similar forms is discussed. 
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METHODS OF ESTIMATING THE POPULATION OF 
INSECTS IN A FIELD 


By GEOFFREY BEALL 
Dominion Entomological Laboratory, Chatham, Ontario, Canada 


PURPOSE OF STUDY 

Counts on the occurrence of an insect were secured to make clear a valid and 
efficient method of estimating the population of the insect in an area. The 
theory of sampling, as developed by Neyman (1934), was applied to this problem. 
It was desired to know: first, what form observations should take in sampling; 
secondly, how good are the results of stratification, or control of regional varia- 
bility; and, thirdly, how accuracy varies when various fractions of the total 
area are sampled. The present study should supply a general method of sampling 
to.be applied in experimental work or in surveys. Details in connexion with 
the method would, presumably, vary according to the insect, to the type of crop 
under investigation, to the number and size of samples possible, and to the 
importance of damage to the crop. 

The problem investigated was that of estimating the total number of insects 
in a single field. This problem differs to some extent from that of estimating 
the population in an area such as a county. 


REVIEW OF LITERATURE 

A considerable amount of investigation on the best method for sampling in 
agronomic work has been carried out. Some of that work, pertinent to the 
present problem, is discussed below with two investigations on the technique 
of sampling for insects. 

The usage of Wishart & Clapham (1929) may first be noted. To these workers 
‘“‘units”’ are ‘‘the ultimate parts of a sample’’, that is, the smallest area from 
which yield has been examined; ‘‘sampling-units”’ are the “parts of a sample 
which are located independently and at random within the area to be sampled. 
Each may consist of one or many units’; a “sample” 
sampling-units taken from the area’’. 

Clapham (1929) made a study of various methods of sampling cereals from 
a plot. This work showed, first, systematic arrangement of sampling-units to 
give an invalid estimate of chance variability and so random drawing to be 
necessary, secondly, the variability of estimates to be much smaller when 
samples were drawn from within subplots than when drawn from the plot in 
general, and thirdly, drawing throughout the plot to be superior to drawing 
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from randomly chosen rows. The latter procedure was the least laborious and 
valid but gave a high error to estimates. Later, Clapham (1931) discussed the 
practical technique of locating sampling-units. 

Clapham (1929) pcinted out that a systematic arrangement of units may 
be combined into a sampling-unit, and Wishart & Clapham (1929) considered 
and employed sampling-units of complex patterns of units. Kalamkar (1932) 
made a uniformity trial on wheat, with the unit employed a half-metre of drill. 
He formed sampling-units in various ways from groups of four units and con- 
cluded that the only satisfactory sampling-unit is a strip running transversely 
to the direction of the rows. 

Influenced by agronomic work of the type discussed above, Marshall (1936) 
made a study of the most suitable method of sampling a field in the determination 
of oviposition by the moth, Heliothis obsoleta Fabr. As had been found in agronomic 
work, so in this work Marshall found variations in eggs per 3 yd. of row to be 
greater between than within rows. He found the part, ascribable to sampling 
errors, of the variability between plots to be small with even 1 or 2 % sampling. 
A second study on sampling for insects is that of Fleming & Baker (1936). They 
made counts on numbers of larvae of Popillia japonica Newman present per unit 


area of 1 sq. ft. over four fairly large blocks of land and they recommended that 
a sample of at least 1 % be taken. 


DESCRIPTION OF EXPERIMENTAL MATERIAL 


Suitable material upon which to test theoretical results in the problem of 
sampling insect populations was found in the adult Colorado potato beetle, 
Leptinotarsa decemlineata Say. This insect is easily counted since it is both 
seen and collected rapidly. Such a count on the number of beetles present in 
a field near Chatham, Ontario, was made on 14 August 1935. This field was 
infested to the unusual extent of about two beetles to the linear foot of potato 
row. The field, a little more than an acre in extent, was fifty-eight rows of potatoes 
wide and about as broad as long. The plants were, on the average, spaced within 
the row a little more than a foot apart. The plot chosen for examination was 
forty-eight rows, or 124 ft. wide, and 96 ft. long. This plot included one margin 
of the field. The field was surrounded by various other crops. 


THEORETICAL BASIS OF WORK 


The paper of Neyman (1934) was the theoretical basis of the present work, 
Neyman discussed the general theory of sampling from strata. By the term 
strata, so far as the present work is concerned, is meant arbitrary subdivisions 
of an area of which the population is to be estimated. From within each stratum 
a certain number of sampling-units was selected. These sampling-units were 
of various kinds formed by combinations of a number of smaller basic units of 
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a fixed size. The terms sampling-unit and unit have been employed by Wishart 
& Clapham (1929) as previously discussed. 

In much work, such as surveys, it will be desirable to make strata correspond 
at least roughly with obvious features, such as slope or wetness of land, which 
will affect the abundance of insects. In experimental work within one field, 
however, it is common practice to select areas that appear to be as nearly as 
possible homogeneous. Further, to have a uniform series of subareas may be 
practically convenient in making counts with a group of workers. Accordingly, 
equal strata will be commonly employed, and in the present paper the discussion 
was restricted to such strata. If each stratum is of the same area, that is, contains 
the same number of sampling-units, the equations and the numerical calculations 
involved in making estimates of population values are more simple than the 
general equations aad calculations of Neyman (1934). 

Denote by N the number of strata and by M the size of a stratum in terms of 
the number of sampling-units contained. M will, of course, vary with the size 
of the sampling-unit. Whatever the number be, each stratum will be divided 
into M sampling-units such that each is a potential sample. 

Consider the notation for the total sampled population. Let X represent 
the total number of an insect in the area to be examined. Let u,; denote the 
number of the insect in the jth sampling-unit (j= 1, 2, ..., M) of the ith stratum 
(¢=1,2,...,N) and u_ denote the average number, calculated over the whole 
field, of the insect per sampling-unit. Then 

Within the ith stratum let the mear value of u,; be u;, and the variance 


—u;)* 


't should be noted that o?, as here defined, is M/(M-— 1) times greater than the 
parallel quantity employed by Neyman (1934). For this discrepancy, allowance 
was made in all equations quoted. 

Consider now the notation for the samples. Denote by m; any number of 
sampling-units drawn from the ith stratum. Let the numbers of an insect found 
in the sampling-units of the ith stratum be With mean and 
with estimated variance, si 

= 


The best linear estimate of X, that is Ka estimate with minimum s.D., will be 


N 
F = M 
i=1 


The standard deviation of F, when the m; sampling-units have been drawn 
randomly, will be, following Neyman (1934), 


} 
(3) 
2 
4 
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where M, is the total number of sampling-units in the ith stratum. Since, in 
the present work, all values of M; = M, equation (3) reduces, so that 


Or = (4) 


A common system of apportioning sampling-units is to make the number 
from each stratum proportional to the magnitude of the stratum. In the present 
work the number of sampling-units drawn from each stratum would be the 
same, that is m;=m in all cases. Under these circumstances, equation (4) 


reduces, so that M(M—™m) % 
m) dor). (5) 


ForM IN WHICH DATA WERE COLLECTED AND ANALYSED 

On the basis of the foregoing theoretical discussion the form of collection 
and of analysis of data on the number of beetles present in the observatioual 
area was determined. This area was divided into small, approximately square, 
units. The population of beetles in each unit was recorded. This count was the 
equivalent of a uniformity trial in agronomic work. 

For the purpose of the present work a 2 ft. length of row was the unit of 
observation. To obtain these units, strings were run transversely to the rows of 
potatoes across the area at intervals of 2 ft. There were 2304 units involved. 
The number of beetles in each unit was counted. 

Various types of sampling-unit were formed by combining adjacent units 
in various ways. For sampling-units of each of a nuraber of given sizes, various 
shapes and orientations were examined. Compact sampling-units, not those 
compounded of scattered units, as suggested by Wishart & Clapham (1929), 
were employed. The compact form seemed the only one practically possible. 
In the course of the present work nine types of sampling-unit were investigated 
as listed, with reference numbers, in Table I. This table indicates for each type 


TABLE I 
The various types of sampling-unit employed 


Orientation of long axis 
Size=k with respect to 
direction of rows 

1 l 1 

2 l 2 Parallel 

3 1 2 Transverse 

4 1 4 Parallel 

5 1 4 Transverse 

6 2 4 _ 

7 1 12 Parallel 

8 1 | 12 Transverse 

9 3 12 Transverse 
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of sampling-unit, first, the smallest dimension in terms of units, secondly, the 
number, k, of units embraced, and thirdly, the direction of the long axis with 
respect to the rows of potatoes. 

The types of sampling-unit listed in Table I are shown diagrammatically 
in Fig. 1. On each form the number of the type is shown. 

Having obtained the number of beetles in each unit, it became possible to 
determine by trial the best shape and orientation and also the best size for 
sampling-units. It may be expected that the occurrence of insects noted in a field 
will vary with the direction in which an observer moves in the field, and that of 
two directions at right angles, one will show more differentiation than the other. 


1 2 3 # 8 


H 


DIRECTION OF Rows 


Fig. 1. The various types of sampling-unit employed. 


Thus, in entomological work, Marshall (1936) found variability between rows to 
be greater than that within rows. Direction of ploughing and slope of a field 
also tend to differentiate the observations in certain directions. One may expect 
the population of phytophagous insects to be influenced by variability in the 
plants of a field. Also differences of shade and of wind in a field, migration 
along rows of plants, and point of ingress to a field, are all factors that tend to 
make the insect population variable in certain directions. Accordingly, the shape 
of soil surface forming a sampling-unit may be expected to be of importance in 
the determination of the accuracy of estimates made from a sample. Long 
narrow sampling-units running in the direction of greater differentiation should 
be the most efficient. 

Just as various types of sampling-units were tested on the data collected, so 
might one have tested various types of strata. Presumably, the best type would 
be a long rectangle running in the direction of lesser variability. However, the 
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problem of type of stratum was not investigated, since, as is discussed below, 
it was thought advisable to fix the strata coterminal with the areas examined 
by each man. 

It was necessary to cover a considerable area and to cover it in one day, since 
the population of beetles was changing rapidly from day to day. Accordingly, 
four men, A, B, C and D made counts. To each man were allotted four subareas, 
or strata, twelve units square. When the counting was arranged, the square 
form, in units, was chosen for the subareas assigned to each man, in case these 
subareas should have to serve as strata, because, within square strata sampling- 
units could be formed to the same extent longitudinally as transversely. The men 
were arranged in Latin square form so that personal effects should not be confused 
with trends across the field and so that the effect of each man might be dis- 


cernible. The positions of the four men involved, and their collections are shown 
in Fig. 2. 


D B A Cc 
1127 1331 628 430 
Cc A D B 
658 635 969 758 
B D | GC A 
869 794 560 411 
A C B D 
523 490 213 517 
| 
Direction of rows 


Fig. 2. The total numbers of beetles taken in each subarea assigned to four men. 


PRESENTATION OF DATA 


The primary data upon which the present paper was based are given com- 
pletely in Table VI of the Appendix. In Fig. 3 the general nature of the variation 
in population, throughout the area studied, is indicated. In this figure the 
population density over the area examined is indicated by the population on 
144 equal constituent subareas, four units square. The counts for the 16 units 
in each subarea were totalled. In the figure each subarea is represented by a 
black spot of which the area is proportional to the number of beetles found 
on the subarea. 


HOMOGENEITY OF DATA 
Before considering the questions, indicated in the foregoing discussion, of 


goodness of various sampling-units, or of the efficiency of the various methods 
of apportioning sampling-units, the general nature of the insect distribution in 
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the observational area was investigated. The data of Table VI, which are presented 
graphically in Fig. 3, suggested there to have been much variation from stratum 
to stratum in the number of insects present. The part of variability ascribable 
to differences between the observers and also the magnitude of the chance 


variability between the strata, as compared with that within strata, were con- 
sidered. 


For the 2304 sampling-units of type no. 1, with k=1, the total variability 
was broken into a part within strata and a part between strata. Since the four 
men, who made counts, were assigned in the manner of a Latin square, as shown 
in Fig. 2, the variability between strata was broken into parts ascribable to 


— DIRECTION OF ROWS > 


Fig. 3. Diagrammatic representation of population density over 
the area under observation. 


rows, columns, men and a remainder term. The analysis is presented below, 
although subsequent work by the present writer to be published later has shown 
that analysis of the type carried out, involving the use of normal theory, is not 
strictly applicable to entomological data because the chance variability of the 
number of insects observed is related to that number. 

In the following analysis of variance, the mean square from within strata 
can be compared with the mean square from the remainder for between strata. 
This remainder is free from differences ascribable to the rows, columns and men. 
One is, then, comparing the chance variability for small subareas within a small 
total area with the chance variability for larger subareas within a larger total 
area. The very great difference observed in the following tabulation between 
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these two variabilities showed that the strata, apart from the differences intro- 
duced by observers and even when row and column effects were removed, varied 
much more than sampling-units within strata. Accordingly, a very great amount 
of the variability within the area should be controllable by stratification. 


| Variability | Degrees of | Sum of Mean st 
ascribed to | freedom | squares square 
| Rows 3 1,695-8 | 565-3 
Columns | 3 2,925-8 | 975-3 
Men 3 2,235-0 745-0 
Remainder between strata 6 1,627-7 271-3 
| Within strata 2288 26,064-7 11-4 
Total 2303 34,549-1 


In the analysis of variance, the remainder variability between strata, free 
from the variability of rows, columns and men, consisted of the chance variability 
between strata and possibly, also, of a differential response by a given man in 
the various strata in which he worked. To assess the significance of the variability 
of men the appropriate sum of squares must be referred to this remainder term, 
since the differences between men are subject to the chance variability between 
strata. 

When the mean square for the men was compared with the mean square for the 
remainder, the result was within the 0-05 level of probability, so that the differences 
between men were not proved significant. The effect of the men was not appre- 
ciable over the variability from stratum to stratum, possibly, because this 
variability was estimated with only 6 degrees of freedom. There is further evi- 
dence, however, in the primary data of Table VI, which leads one to judge that 
the effect of the men was appreciable. In these data the counts made by each 
man may be viewed in either dimension as 12 rows of units. Consideration of 
such rows suggested that the differences between adjacent rows in the area 
covered by one man were smaller than the differences between adjacent rows on 
the borders of the areas covered by two men. Such differences are free from 
the great chance variability of strata. In considering these differences the 
uniformity from observer to observer of the work done can be studied without 
supposing that any one man worked with uniform efficiency. One can show 
statistically that the men collected differently. 

Since the differential efficiency with which the men collected would make 
the strata composed of units collected by more than one man unduly hetero- 
geneous, it was thought advisable to use as strata the subareas worked over by 


each man. In this procedure the variability introduced by the men was combined 
with regional variability. 
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Any man must miss insects when he is making a count and, from the dis- 
cussion above, the number missed appears to vary with the observer. These 
considerations modify the meaning of X, which must be regarded as the total 
number of insects that may be found, with complete examination, by the 
particular men employed in counting, rather than the total number of insects 
present in the area. In field work where one is making counts on one area to be 
compared with those from another area, variability in the performance of ob- 
servers would occur and must be taken into consideration. Thus, in experimental 
work all counting on_a given block may be done by a single man or in counts 
for a survey of a number of fields, each man may do a constant fraction of the 
work in each field. 


EFFICIENCY OF VARIOUS TYPES OF SAMPLING-UNIT 


The relative efficiency of various types of sampling-unit of the same mag- 
nitude, in the case where m sampling-units were drawn from each stratum, was 
judged by means of the following ae derived from equation (5) by putting 
M,=MN, m=mN: Mo 1s 

0 
It can be seen that, when the total soar M,, of sampling-units in the field, 
the total number, m,, of sampling-units to be drawn, and the number, N, of 
strata, are fixed, then the accuracy ig the estimate, F, is determined by the 


average variance within strata, oj = > o?/N. Accordingly, among several forms 
i=1 


of sampling-unit, that which gives the smallest value to o§ gives the greatest 
accuracy to the estimate, F. 

The relative efficiency of sampling-units differing in magnitude was judged 
by means of equation (7), shown below. When the number of strata and the 
fraction of the area to be sampled are fixed, o, may be supposed affected by 
increase in size of the sampling-unit. Suppose that when the sampling-unit is 
of unit size one obtains o,, and, when the sampling-unit consists of k > 1 units, 
one obtains oj. When k=1 let M have the value M’ and m the value m’. For 
each value of k& there will be a value for 0? and values, M” and m", for M and m, 


” ” 


1 : 
such that — = ae es From equation (5), 


M k 
Nk 


From equation (7) it is apparent that for any sampling-unit, of k>1 units, the 
relative magnitude of oy and of op will vary as the relative magnitude of ,/(0?/k) 
and of ./o;?. Accordingly, the relative efficiency of sampling-units of any sizes 
may be judged by the relative magnitudes of o(?/k. The expectations of 04?/k 
would, of course, be the same if the insects involved were distributed quite 
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randomly over the strata. Under these circumstances, large or small sampling- 
units would be equally good. 


For sampling-units of various forms and of various sizes, that is with various 
values of k, o¢? and o(?/k are shown in Table II. 


TABLE II 


Mean variance, within the sixteen strata, for various sampling-units 


| i 
| Sampling-unit | 


| | "2/1, 

type of Table I | k | Ce To /k | 

| | | 

| l 11-39 11-39 

| | | 

2 2 25-27 12-63 | 

3 2 24-50 | 12-25 

4 4 61-39 15-35 | 

5 4 50-12 12-53 

| 6 4 58-08 14-52 
| 7 | 12 | 298-91 24-91 

8 12 | 150-37 | 12-53 | 

| 9 12 233-76 19-48 | 


From the values of o(? it can be seen that, within each size class of sampling- 
units, the long narrow form (nos. 3, 5 and 8) running transversely to the direction 
of the rows was the most efficient, and that the long narrow form (nos. 2, 4 and 7) 
running in the direction of the rows was the least efficient. Such a result is 
explained by the apparent correlation of the number of beetles on units along 
the rows of potatoes, as shown in Table VI. It is of interest to note that for long 
narrow sampling-units running transversely to the rows the value of (?/k changed 
but little. Such a result means that in the direction considered, the population 
of the insect was practically randomly distributed. Bearing in mind equation (7), 
it can be seen from the values, o¢7/k, that the value of oj} was in general greater 
as the value of k, or the size of the sampling-unit, increased. 

The values shown in Table II indicated that the estimate of F from a given 
amount of sampling had least variability with the smallest sampling-unit 
employed. While this conclusion applies to the case where m; was the same for 
all strata, a similar effect probably occurs in the more general case, where m; 
differs from stratum to stratum. 

Day (1920) pointed out that long plats are only best when the length of the 
plat lies along the direction of the greater changes of soil fertility. He suggested 
that if the direction of greater differentiation is unknown square plats are probably 
best. From the data of the present work, even when narrow sampling-units 
running transversely to the direction of the rows were employed, the estimate 
of F was a little less reliable for k>1 than ior k=1. In practice, since it may 
happen that some phenomenon such as slope acts against the effect of row direction 
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in deciding the direction of greater variability, one would not necessarily know 
the direction in which long sampling-units should run to get the best results. 
Accordingly, it is probable that, in general, the best results would be obtained 
with the smallest sampling-unit. 


EMPLOYMENT OF STRATIFICATION 


When the question of the type of sampling-unit is decided, the problem of 
the best method for apportioning sampling-units must be considered. Accordingly, 
for the data of the present work, the percentage of area, i.e. 100m,/M), which 
would need to havé been sampled to secure a specified degree of accuracy was 
calculated. The degree of accuracy was expressed in the familiar form of standard 
deviation of the estimate of population in terms of the population, i.e. by 7,/X. 
For sampling-units of a given type there was found the total number, mp, 
(a) necessary in order to obtain a given value of o, without stratification, and 
(b) necessary with the number, m,;,examined in each stratum proportional to M,. 
The respective values, m), were found simply from equation (5), for in the case 
of no stratification, N=1. The value of m, was, also, determined for m; made 
proportional to g;, as will be discussed in the next section of this paper. Table II 
indicates the proportional amount of sampling necessary to ensure values of 
o,/X equal to 0-01 and 0-10, by employing the various methods of apportioning 
sampling-units of various sizes previously discussed. The calculations were 
made for sampling-units of size k=1, 2, 4 and 12, when the best shaped and 
orientated sampling-unit, i.e. nos. 1, 3, 5 and 8 of Table I, in each size-class was 
employed. 

From Table III it is apparent that there was a considerable reduction in the 
percentage of the area to be covered when stratification was employed. The 
reduction was greatest when the sampling-units were large and also when the 
desired degree of accuracy was low. The results also indicate that further reduction 
was effected when the number of sampling-units apportioned to each stratum 
was proportional to the standard deviation per stratum. It will be noted that 
such a system is hardly practicable unless &, the size of sampling-units, is of 
such a value that M, the total number of sampling-units per stratum, is great. 


OPTIMAL APPORTIONMENT OF THE WORK OF SAMPLING 


If stratification be employed, the value of c, is not reduced to the lowest 
level possible for a given amount of sampling by making oy values of m; pro- 


portional to M;. Neyman (1934) considered how m m,; sampling-units } 
i= 


should be apportioned to the NV strata so that 7, shall be minimal. He found that 
a} is minimal if the values, m,, are proportional to M,o,, and then 
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where M,= MN. In the present work, where M was the same for each stratum, 
o} Was minimal if the values, m;, were proportional toa;, and equation (8) reduces 


so that (M2 N 2 N 
Or = oj}. (9) 
Mo \i=1 i=1 
TABLE Iii 


The percentage of area which must be sampled in order to obtain 
a specified degree of accuracy 


| | | 
| With stratification 
| Sampling- | | : 
wn type | | | 
| of Table I ee m; proportional | m, proportional 
| | to 
| 
Degree of accuracy, o,/X =0-01 
1 1 74:37 | 68-79 | 61-72} 
3 2 79-01 | 70-32 
5 4 83-91 70-80 | 
8 12 91-55 | 70-80 | 
Degree of accuracy, p/X =0-10 
1 1 2-82 | 2-16 1-95+ 
3 2 3-63 2-31* 
5 4 4-96 2-37* 
8 12 9-77 2-37* 


* Note that actually it would have been impossible to use these values since the necessary 
minimum of two sampling-units per stratum would not have been attained. 

+ These solutions were somewhat unreal, since, with the levels of sampling and with the 
variability per stratum involved: (1) in the case of o,/X =0-01, although there were only 144 
sampling-units, 150 would have to have been apportioned the most variable stratum; (2) in the 
case of O,/X=0-10, only 1-17 sampling-units would have to have been apportioned the least 
variable stratum. The last column is discussed at length in the next section of this paper. 


To complete the discussion on Table III it may be noted that values of mp, 
necessary to obtain a given value of o,, as shown in that table, can be found 
simply from equation (9) when m; is made proportional to o,. In computing 
these values of my, the requisite integrality of the number of sampling-units 
per stratum was ignored and the limit of accuracy possible by this method of 
sampling was found. This apportionment was made with only sampling-unit 
no. 1 (k=1), since for larger sampling-units the results tend to be meaningless 
with the present data. For example, in the case of & = 12, m; in the least variable 
stratum could not fall below 2 and in the most variable stratum could not 
exceed 12. The largest value of o; is 4-01 times greater than the smallest, so 
practically only one level of such sampling was possible. As can be seen in 
Table III, even for k=1, with the range of accuracy considered, one or two of 
the assignments to strata with extreme variability were unreal. 
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APPROACH TO OPTIMAL APPORTIONMENT IN PRACTICE 


In the foregoing discussion it has been pointed out that, if m) sampling-units 
are apportioned to the strata so that m,; is proportional to o;, 7p is minimal. 
Since in practice one would not know the values, o;, an optimal apportionment 
of sampling could not be made exactly. Estimates, s,, made from preliminary 
sampling, might be employed, however, in place of the true values, o;. 

Ore can find the probability that, when m; is made proportional to s;, the 

value of a, will be smaller than when m; is constant. It can be seen that, by the 
first system of apportionment, a, will be subject to chance variation depending 
upon the estimates, s;, One can determine, however, for a preliminary sample 
of any size, the probability that o, will be greater under the first system than 
under the second. This determination can be made by using the moments of 
z = m,o?/M? as given approximately by Sukhatme (1935). 
The probability that the value of 7, would be less with m; proportional to s;, 
based on 15, 10 or 6 sampling-units, than with m; constant, was found by applying 
the procedure of Sukhatme to the data of the present paper. For each number, 
15, 10 or 6, the first three moments of z were found. With preliminary samples 
of 15 and 10, £, was 1-38 and 2-01, respectively, so that a type III Pearson curve 
was fitted by the first three moments. In the case of a preliminary sample 
of 6, £,=5-66 was so great that a type III curve was fitted by the first two 
moments and the start, which comes from the value of ¢, when m; is proportional 
to a;. From these three curves the probability that the value of 7, would be 
less with m; proportional to s; than with m,; constant was 0-99973, 0-957 and 
0-519, respectively. These probabilities show that improvement in the estimate, 
F, is almost certain to result from a preliminary sample of 15, will probably 
result from one of 10 but doubtfully so from one of 6. It should be noted that 
Sukhatme did not advise using a preliminary sample smaller than 15. 

Sukhatme suggested, as illustra’ed in the next section, that one might in- 
corporate preliminary sampling, made toestimates,, with supplementary sampling 
in a total sample to be used in forming the required estimate of population. 

It is conceivable that a preliminary estimate of the relative magnitude of 
the values o; might be made from a cursory or visual survey rather than from 
exact preliminary sampling. Although it is difficult to appreciate variability, 
advantage might be taken of the relationship that exists between the number 
of insects per unit area and the chance variability of that number, since the 
level of population is easily appreciated. Thus if a field man were to judge from 
a visual survey that an insect were four times more numerous in one stratum 
than in a second, then, since the standard deviation should vary approximately 
as the root of the mean number of insects per sampling-unit, the first stratum 


ag should be sampled twice as heavily as the second. Whether an efficient apportion- 
ee ment could be made on such a basis would have to be tested in practice. 
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APPLICATION OF RESULTS 


In order to illustrate the application of the foregoing work it is supposed 
that one wish to make an estimate of the number of beetles in the area considered 
in the present paper. The practical procedure to be followed is indicated below. 

It is necessary in the first place to fix the type of sampling-unit to be employed. 
For the present illustration it is supposed that one choose type no. 1 of Table I. 
One must fix randomly the position within the strata or field of the sampling- 
units to be examined. The choice involved may be made by using random 
sampling numbers, Tippett (1927). 

In the second place it is necessary to fix the fraction of the area to be sampled, 
that is to choose a value of mz, in relation to M,. In the present illustration two 
cases are considered, the first where 25 % of the total area is comprised in the 
sample, and the second where approximately 15 % is comprised. 

In the third place one must decide how the work of sampling shall be ap- 
portioned. This work can be done without stratification and with stratification. 
With stratification it can be done with m; constant for all strata and also with 
m,; approximately proportional to o;. 

Whatever method of drawing sampling-units be employed, the total number 
is 576, if 25 % of the total area is to be examined. In the case where no stratifica- 
tion is employed the sampling-units may simply be drawn successively and 
independently. The procedure is equally simple if 36 sampling-units are drawn 
from each of the 16 strata previously discussed. In both cases, F, the estimate 
of total population, may be made from equation (2). If approximately 15% 
of the total area is to be examined, then with no stratification 336 sampling- 
units must be chosen or with stratification 21 sampling-units must be examined 
in each stratum. The case, however, where an attempt is made to secure values 
of m,, approximately proportional to o;, requires more detailed discussion. From 
this detailed discussion the procedure for the more simple cases will be 
obvious. 

It is supposed that in the present population study, preliminary and supple- 
mentary samplings are possible. In order to make a well apportioned sample 
of 25% a preliminary sample of 15 sampling-units per stratum is made first. 
For an illustration consider the procedure in the first stratum, for which the 
counts are shown in the upper left-hand corner of Table VI. The position of 
any sampling-unit may be represented by one number indicating the column 
and another the row in which it lies. In such terms, fifteen positions, drawn 
randomly without replacement, are: 2-3, 2-5, 2-8, 3-4, 3-5, 4-3, 4-8, 6-12, 


7-6, 8-8, 8-11, 9-7, 9-8, 11-9, 12-8. Examining the sampling-units indicated 
by these numbers one obtains the fifteen observations: 9, 5, 3, 7, 7, 8, 7, 8, 5, 14, 
1, 7, 11, 10, 4. From these observations one calculates s,= 3-26, as shown in 
Table IV. In that table there is shown for each stratum a value, s;. The order 


{ 
[ 
| 


436 Methods of Estimating the Population of Insects in a Field 


in which the strata are listed is down the columns taken from left to right in 
Table VI. 


16 
It is now necessary to find values of m; proportional to s; so that ) m; = 576, 
i=1 


since a 25% sample is to be taken from the 2304 sampling-units in the whole 
16 

area. Thus for the first stratum, s, = 3-26 and, since > s,= 47-87, 
i=1 


m, = (3-26/47-87) 576 = 39, 


as shown in Table IV. In that table there is shown for each stratum a value, m;. 

In order to make a sample of 15 %, a preliminary sample of 6 sampling-units 
per stratum is made first. The procedure is similar to that shown above for a 
sample of 25 % with a preliminary survey of 15 sampling-units. The values, s,, 
and the corresponding values of m; are shown in Table IV. 


TABLE IV 


Numerical results in the process of sampling with m; 
approximately proportional to a; 


| 2 | 13 | 4 | 15 | 16 

o; | 5.80 | 2-77 | 348 | 3-21 6-36 | 2-78 | 3-76 | 2-09 3-27 | 4-57 | 2-35 | 1-34 | 2-67 | 3-14 51 0, 
Preliminary sample of 15 for a sample of 25% 

8; ign 2-97 | 4-19 | 2:28 | 1-68 | 2-89 | 2-08 | 1-74 | 2463 

m, 39| 36| 35| 42| 63| 34) 34] 36| 50) 27) 20| 35] 25 21 | 32 

Sz, | 284| 177 | 215 | 180 | 597 | 156 | 272 | 117 | 164 | 332 | 116 | 42| 115 | 136| 49 126 
= Preliminary sample of 6 for a sample of 15% 

s, | 5-22 | 1-94 | 3-95 | 2-68 | 5-01 ie | 8-94 | 0-55 | 1-08 | 3-44 | 1-75 | 1-17 | 1-67 | 1-79 | 2461 | 279 

16) 31) 21) 40) 23) 31) 21) 22 

| 59 | 177 | 70 | 361 | 104 | 190] 28) 48) 147 | 42) 15) 06 97 


The preliminary drawings must be supplemented to make m, as great in 
each case as is required in Table [V. Thus, in the case of the first stratum when 
a sample of 25% is desired, m;=39. Accordingly, it is necessary to make a 
supplementary sample of 24 sampling-units, and the previous random drawing 
without replacement must be continued. Twenty-four such sampling-units, in 
the terms previously employed, are: 1-1, 1-2, 2-7, 2-11, 3-1, 3-2, 3-3, 4-5, 
4-11, 5-4, 5-11, 6-2, 6-5, 6-6, 6-8, 7-11, 8-2, 8-7, 9-9, 10-9, 11-2, 11-4, 12-5, 
12-11. By reference to Table VI, the observations corresponding to these positions 
can be discovered, thus, one obtains: 2, 0, 10, 12, ete. Over all the strata 336 
supplementary sampling-units must be found and then, from equation (2), F can 
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be calculated on the basis of 576 sampling-units. In finding F, one must estimate 
the mean for each stratum; for instance, find that 


m, 
=, = (= vy) / m, = 284/39 = 7-28. 
i=1 
my 


From the values of ¥2;;in Table IV, it can be seen that 
j=1 


F = 144(284/39 + 177/36 + 215/35 + ... + 126/32) = 11,305. 

When a sample of 15 % is desired, 241 supplementary drawings must be made 
over all the strata so that F can be calculated on the basis of 336 sampling-units. 

It may be of interest to note with what accuracy X would be estimated by 
each of the three methods, first, sampling without stratification, second, uniform 
sampling with stratification and third, sampling within strata in proportion to 
the values of sin Table 1V. Accordingly, o, for each method and also the minimal 
value of a», which is obtained when m, is proportional to o;, are shown in Table V, 
It should be noted that 7, has a fixed value for a fixed amount of sampling in 
the first two cases, but in the third case the value ofc, depends upon the particular 
values of m; shown in Table IV and so, if fresh values of s; were calculated, the 


values of m; and of a, would probably differ from those in the third column 
of Table V. 


TABLE V 
The value of o, obtained from various methods of sampling 


; . | Stratification | Stratification | 
Without | Stratification | 


withm; | withm, | 
| stratification | stan; | Proportional | proportional 
constant | to 8, to 
_|— | 
Preliminary sample of 15 | 322-0 | 2806 268-1 | 260-7 
for a sample of 25% | 
Preliminary sample of 6 | 449-9 | 392-1 403-4 367-8 
| for a sample of 15% | | 
| | 


From the values of o, in Table V it can be seen that a considerable improve- 
ment in the estimate, F', was obtained by stratification. In the attempt to 
secure further improvement by making m; proportional to s;, rather than constant, 
the results are such as would be anticipated from the discussion of the previous 
section for, on the one hand, when the preliminary survey consists of 15 sampling- 
units, an improvement occurs, but, on the other hand, when only of 6 sampling- 
units, the results are not as good as those from general uniform sampling. The 
final column in the table shows the limit of accuracy that would be attained 
were the values of o; known. In the case of the preliminary samples of 15 the 
guesses at m; obtained from s; have been good enough to make the value of o, 
approach fairly closely to its ideal minimum of 260-7. 
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SUMMARY AND CONCLUSIONS 


The discussion of Neyman (1934), on making estimates of population, has 
been applied to the entomological problem of finding the number of the Colorado 
potato beetle, Leptinotarsa decemlineata Say, on a heavily infested field. Obser- 
vations were made on the population of beetles per 2 ft. of row of potatoes for 
entire rows in the area considered. From these observations, sampling-units 
were variously formed and their relative desirability studied. 

Neyman’s general equations are simplified and computation lightened when 
the strata, or subdivisions of the field, can be made equal. In the field considered, 
it was found that the variability was much greater between strata than within 
strata. Although this variability was due to some extent to differences in the 
work of the different observers, for the purpose of discussion the entire variability 
was regarded as real. It was found that a marked reduction in the area which 
must be examined to secure a given degree of accuracy in the estimate of popu- 
lation could be secured, first, by stratification and secondly, by making the 
number of sampling-units examined in a given stratum proportional to the 
standard deviation for the sampling-units in that stratum. These standard 
deviations will not be, in general, known, and the experimental data have been 
used to illustrate how their values may be replaced by estimates obtained from 
a preliminary survey, on the lines suggested by Sukhatme (1935). 

In the present case it was found that, if the total sampling were to amount 
to 25% of the whole, then a preliminary survey involving the selection of 
15 sampling-units per stratum (10-4 % of the whole) would have led to a definite 
ye oy reduction in the standard error of estimate (as shown in Table V). Were the total 

ao sampling to amount to only 15 % of the whole, a preliminary survey of 6 sampling- 

units per stratum (4:2 % of the whole) was found to be inadequate. 

The main object of this paper has been to investigate, from a statistical point 
of view, the consequences of applying certain sampling methods to an insect 
population. The question of how far, in following the Neyman-Sukhatme method, 
any extra inconvenience due to unequal numbers of sampling units per stratum 
or the need for preliminary sampling, would be justified in practice by the extra 

re accuracy gained, is of course a matter requiring fuller consideration by the 
entomologist. 
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APPENDIX ON PRIMARY DATA 


The number of beetles per unit, over the area examined, is shown in Table VI. 


As previously discussed, the area was broken into sixteen subareas, twelve units 
square. The limits of these subareas are indicated by straight lines. 


i 
i 
Bg 
1 
4 
4 
4 
‘ 
d 
| 
§ 
Ts 
18 


THE COMPARATIVE ADVANTAGES OF SYSTEMATIC AND 
RANDOMIZED ARRANGEMENTS IN THE DESIGN OF AGRI- 
CULTURAL AND BIOLOGICAL EXPERIMENTS 


By F. YATES 


1. InrTRODUCTION 


Ever since the introduction of the principle of randomization into replicated 
experiments it has been realized that certain of the arrangements generated by 
the randomization process were likely to be less accurate than others; conse- 
quently there has always been a conflict whether the general efficacy of a set of 
experiments might not be improved by the rejection of those arrangements 
which appeared a priori less accurate. In its extreme form such a- procedure 
results in the rejection of the principle of randomization altogether, and the 
selection of one or more arrangements which on account of their balance or other 
features especially appeal to the experimenter. 

Those who habitually use random arrangements may have thought that the 
issue was finally settled in favour of randomization, but the recent recrudescence 
of the dispute in the scientific journals, and the continued use of systematic 
arrangements in agricultural field trials, make it clear that there is still a 
considerable body of opinion which favour such arrangements. A review of the 
recent arguments, and a re-examination of the numerical material cited in 
support of them, may therefore be of value. 


2. STATISTICAL PRINCIPLES 


The statistical treatment of the results of replicated experiments is usually 
based on the assumption of the normal law of error, and the general structure 
of the analysis is derived by the method of least squares. 

The method of least squares was originally developed by Gauss, for the 
purpose of deriving the best estimates of unknown quantities from observational 
material in astronomy and geodesy. Gosset’s discovery of the ¢ distribution, 
and Fisher’s extension to the z distribution, have provided exact tests of 
significance when, as usually occurs in practice, the degrees of freedom for error 
are few. The introduction of the procedure of the analysis of variance by Fisher 
has also considerably facilitated the arithmetical computations, particularly in 
the type of results that arise from planned experiments. 

These modern advances, in their turn, have led to the wider recognition in 
practical work of the many different sources of variation to which experimental 
and observational material is subject. Without exact tests of significance and 


\ 

> 

4 
| 

| 

: 

4 

{ 
| 

) 

& 


F. YATES 441 


the technique of the analysis of variance the assessment of these various 
components of variation would be a difficult and involved business. 

For its correct application the method of least squares requires that any 
components of variation which are not eliminated by the design shall be 
normally and independently distributed. Now it is immediately evident that the 
yields of agricultural field plots (even after allowing for the effects of local 
control, such as blocks) are not independently distributed. Neighbouring plots 
tend to be positively correlated. This destroys the whole theoretical basis of the 
method of least squares, and in particular is liable to vitiate completely the 
estimates of error and tests of significance. 

The difficulty can be met, as Fisher perceived, by the introduction of 
randomization into the design. This has the effect of removing the disturbance 
due to the correlation of neighbouring plots, so that yields can be treated as if 
their errors were uncorrelated. Adequate randomization requires that if all 
possible arrangements generated by the randomization process are put down in 
turn on the same set of yields (such as those from a uniformity trial), then the 
average of the mean squares for the (dummy) treatments is equal to the average 
of the mean squares for error. If this condition is not fulfilled it can easily be 
shown that certain types of correlation in the original material will give rise to 
biases in the estimate of error; such biases, if they exist, cannot fail to disturb 
the ordinary tests of significance. 

Systematic arrangements lack this necessary element of randomization, and 
consequently their analysis by the method of least squares can never have the 
same objective validity as has the similar analysis of random arrangements. 
It is sometimes contended that the latter analysis is not really valid because 
the original material is not normally distributed, or in some other way fails to 
satisfy the conditions required for analysis by least squares. Actually, however, 
it is known that the majority of material that the experimenter has to handle 
does fulfil the required conditions sufficiently nearly, provided that a proper 
process of randomization is adopted. Consequently this contention must be 
regarded rather as a debating point than as a serious objection which will in 
its turn justify the abandonment of randomization. 


3. THE ADVANTAGES AND DISADVANTAGES OF SYSTEMATIC ARRANGEMENTS 


The advantages claimed for systematic arrangements of the “ balanced” type 
are that they give more accurate results than do random arrangements, and that 
they are more easy to carry out in the field. 

The disadvantages are as follows: 

(1) There can be no assurance that the estimate of error is unbiased, however 
this estimate is arrived at, and the objectivity of the tests of significance is 
consequently lost. 
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(2) Many different methods of estimating the error can reasonably be 
advocated, so that the tests of significance are not even unique. 

(3) The comparisons of different pairs of treatments are subject to different 
errors, so that even if the estimate of error is reasonably unbiased, it cannot be 
used to test individual differences. 

(4) Biases may be introduced into the treatment means, owing to the pattern 
of the systematic arrangement coinciding with some fertility pattern in the field, 
and this bias may persist over whole groups of experiments owing to the arrange- 
ment being the same in all. Competition between plots with different treatments 
which always fall néxt to one another may produce similar effects. 

The first disadvantage is admitted by some, but by no means all, of the 
advocates of systematic arrangements. Coupled with the admission is usually 
the plea that fully unbiased estimates of the errors of single experiments are not 
really required. Thus Gosset (1937) has discussed the point at some length, but 
Neyman (1937) has presented results which purport to show that two tests 
of significance which have recently been proposed for the half-drill strip arrange- 
ment are substantially accurate. 

Gosset also recognized the second disadvantage, but maintained that it 
applied only to the half-drill strip method, whereas he himself provided an 
example in the same paper of another type of systematic arrangement for which 
many different methods of estimating the error immediately suggest themselves. 

The third disadvantage has, so far as I know, never been fully recognized by 
the advocates of systematic arrangements. Indeed it is sometimes claimed to 
be an actual advantage. 

The fourth disadvantage has, of course, been recognized for a long time, but 
the advocates of systematic arrangements have always maintained that the 
danger (except possibly in rare instances) can be avoided by care and foresight 
on the part of the experimenter. 

It is very difficuli to refute or substantiate this last claim, since from the 
nature of the case any biases that are in fact introduced will not be recognized 
as such, being attributed to treatment effects. Clearly biases affecting a whole 
group of experiments can be avoided by re-randomizing the treatments in each 
experiment, though this results in some loss of simplicity in execution, and is 
by no means always done. It is worth noting, however, that randomization 
appeals to practical experimenters more because it eliminates biases in the 
treatment means than because it provides a valid estimate of error. It is only 
when faced with the task of reducing and co-ordinating the results of large 
numbers of experiments, and of increasing the efficiency of future experiments, 
that they fully appreciate the existence of such estimates. 

In short, randomization provides an assurance, not only to the experimenter, 
but to others who may be more sceptical than he, that the magnitude of the 
ordinary sources of disturbance, other than those eliminated by the arrangement, 


| 
| 
| 
| 
| 
} 
| 
4 
} 
4 
| 
aps 


F. YaTEs 443 


has been evaluated by means of the estimate of error. It does not, of course, 
provide a panacea which removes all need for care and foresight: it cannot take 
account of types of disturbance which act selectively on the various treatments 
or varieties (e.g. bird damage), and a badly planned or carelessly executed 
experiment will still be inaccurate even though it is randomized, but the 
experimenter will at least know of its inaccuracy. 

In co-operative experiments carried out by a number of workers at different 
places, where close supervision is frequently both difficult and expensive, and 
many of the workers have little training in experimental work, this assurance 
is doubly valuable. 

The real question at issue, therefore, is whether the gain in accuracy and 
simplicity of execution are of such magnitude that they outweigh the manifest 
disadvantages of systematic arrangements. The advocates of systematic 
arrangements have claimed that the gain in accuracy, at least in certain cases, 
is very considerable: thus Hudson, quoted by Gosset in an appendix to his 
paper (1937), gives a set of comparisons between random and _ systematic 
arrangements in which the random arrangements gave, on the average, only 
one half the information that was yielded by the systematic arrangements. 

Hudson’s investigation, however, cannot be regarded as sufficiently extensive, 
or sufficiently representative of ordinary experimental practice, to provide a fair 
estimate of the gain in accuracy due to systematic arrangements. In practice 
the average gain will probably be founc| to be decidedly smaller. Indeed in a line 
of research in which the experimental technique is actively developing, the 
random arrangements actually used are tikeiy to be more accurate than the 
accepted systematic arrangements, since the unequivocable information that 
is provided on the error of each experiment as it is conducted, itself leads to 
advances in technique which far outweigh the small gain that might theoretically 
result from the use of some especially favourable systematic arrangement. 

In the subsequent sections of this paper the above points will be examined 
in more detail. In the next section the special poinis that arise in connexion 
with the half-drill strip method will be discussed. I have chosen this particular 
type of systematic arrangement, not because I consider it of special importance, 
but because it was Gosset’s advocacy of this arrangement that gave rise to the 
recent controversy, and because it does provide an excellent example of the 
many defects of even the simplest of systematic arrangements. 


4. BARBACKI AND FISHER’S TEST OF THE HALF-DRILL STRIP METHOD 


Following Gosset’s advocacy of a new method of estimating the error of 
half-drill strip arrangements, with his general endorsement of the utility of this 
design (1936), Barbacki & Fisher (1936) imposed a half-drill strip arrangement 
on the yields of a uniformity trial on wheat, reported by Wiebe (1935). 

Wiebe’s trial consisted of 125 rows, harvested in 15 ft. lengths, twelve from 
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each row, each length being separated from the next in the row by a path. 
Barbacki & Fisher grouped these rows by sixes, omitting one row between each 
pair of groups so as to simulate a half-drill strip design. Thus from the first 
104 rows they obtained eight pairs of half-drill strips, or four sandwiches 
(ABBA), for each 15 ft. length, i.e. forty-eight sandwiches in all. These forty- 
eight sandwiches they treated as independent and compared the mean difference 
(A -B-—B+A) with its variance estimated from the forty-eight values in the 
forty-eight sandwiches, obtaining a highly significant result (¢= 2-50). 

Moreover, having demonstrated the bias in the mean of the forty-eight 
systematic sandwiches, they proceeded to consider the accuracy of arrangements 
in which each of the forty-eight sandwiches was randomized independently 
(ABBA or BAAB), and also of arrangements in which each of the ninety-six 
pairs of half-drill strips was randomized independently (AB or BA). They 
concluded, inter alia, in the summary of their paper: 

“2. Using an extensive uniformity test it is found that the arrangements 
randomizing either pairs or sandwiches of half-drill strips give smaller errors 
than the systematic arrangement advocated as more precise. 

“3. As a consequence experimenters using the systematic arrangement* 
systematically underestimate their errors.” 

Gosset (1937) severely criticized this procedure, pointing out that one of 
the reasons for the significance attained in the systematic arrangement was the 
high correlation between parts of the same strip, which Barbacki & Fisher 
treated as independent, that the random arrangements gave more precise results 
because they were made up of smaller plots than the systematic arrangement, 
and that in any case the generalization from the results of a single trial contained 
in the third paragraph of the summary was unjustified. 

There is some substance in these criticisms. As regards the second point, 
it is clear that some balanced arrangements are likely to be less accurate than 
others, and it may be legitimately contended that Barbacki & Fisher were 
comparing their random arrangements with a balanced arrangement which was 
not the best that could be devised, given these unit plots. Gosset does in fact 
suggest a balanced arrangement which compares favourably in accuracy (in this 
one trial) with the random arran >ments. (This arrangement is discussed in § 7.) 

The generalization in the third paragraph of the summary was clearly 
somewhat sweeping if it was intended to apply to a half-drill strip arrangement 
on any field. It seems reasonable to suppose, however, that all that Barbacki 
& Fisher had in mind was experiments on this particular field. (They had given 
an example of a set of six such experiments in their paper.) Actually, as will 


be shown later, the generalization does appear to be true of half-drill strips in 
general. 


* Not “arrangements” as quoted by Gosset. The paragraph cannot therefore refer to any 
arrangement other than the half-drill strip. 
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In a way Gosset’s first criticism is also a fair one, but it is a two-edged 
weapon, for it serves to emphasize how entirely arbitrary are the conventions 
that are ordinarily adopted in the calculation of the error of half-drill strip 
arrangements. 

Looked at-objectively the half-drill strip arrangement is really equivalent 
to alternate whole drill strips of the two varieties, with the additional defect 
that each variety is drilled by one-half of the drill only, so that any fault in the 
drill, e.g. a stopped coulter, will favour one variety at the expense of the other. 
The arbitrary division of each drill strip of each variety into two halves 
necessitates a special adaptation of the drill, and the harvesting of nearly twice 
the number of plots that would have to be harvested if alternate drill strips 
were sown. Yet though the experimenter habitually puts himself to considerable 
trouble to divide each drill strip lengthwise, he may not divide it transversely, 
for as Gosset says: ‘“‘since such ‘sheaf weights’ may be positively correlated 
such a method of calculating the error is fallacious.”’ As will be shown later in 
the paper, this longitudinal division of the drill strips is a hitherto unsuspected 
source of disturbance which tends to invalidate Gosset’s method of estimating 
the error. 

Having noted the high correlation between different parts of the same strip, 
Gosset examined Wiebe’s results in more detail. He noticed that every eighth 
row, beginning with the third, gave an exceptionally high yield. He pointed 
out that Wiebe was using an eight-row drill (a point Barbacki & Fisher seem to 
have overlooked) and attributed this irregularity to some defect of the drill. 
He appears to have thought this provided a complete answer to Barbacki & 
Fisher’s anomalous results. Actually, of course, the trial provides an excellent 
example of just that type of drilling defect which may completely vitiate the 
results of a half-drill strip experiment. 

Gosset also failed to notice that it provides an example of the type of 
fertility wave or other periodic variation* which may equally vitiate the results, 
though it should perhaps be stressed that he was still contemplating revision of 
his paper at the time of his death, and it is very probable that he might have 
modified it considerably had he lived to see it through the press. Had he carried 
out a fuller analysis, he would have found that Wiebe’s trial gives results 
which are far more unfavourable to the half-drill strip method than Barbacki 
& Fisher supposed. This examination is carried out in the next section. 


5. RE-EXAMINATION OF WIEBE’S UNIFORMITY TRIAL 


In view of the fact that Wiebe was using an eight-row drill, it would seem 
most reasonable to test the half-drill strip method with strips of four rows. 
This procedure has the additional advantage that it provides more numerical 


* Apparently in this case due to irregularities of drilling—see additional Note at end of paper. 
Biometrika xxx 
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material. Since nearly sixteen drill widths are available there is no need to 
divide the rows into sections, and Gosset’s main objection to Barbacki & 
Fisher’s analysis is overcome. 


TABLE I 


Yield of each row of Wiebe’s trial (units of 100 g.) 


Rows 
1-16 17-32 33-48 49-64 65-80 1-96 | 97-112 | 113-125 
71 63 63 69 71 15 78 74 
76 66 65 70 73 80 77 72 
83 72 71 76 77 84 85 gl 
71 62 62 67 67 73 72 70 
301 263 261 282 288 312 312 297 
73 59 61 65 69 ee 67 
73 67 67 75 79 80 77 73 
68 61 63 69 72 74 73 67 
59 56 59 65 68 70 69 67 
273 243 250 274 288 298 288 274 
63 59 58 66 70 70 69 65 
72 65 66 72 76 15 74 71 
79 72 72 79 83 80 82 78 
65 62 63 67 74 74 15 67 
279 258 259 284 303 299 300 281 
64 61 60 65 71 71 15 70 
72 69 69 76 81 81 84 [76] 
67 61 66 70 76 76 72 [70] 
61 61 69 73 15 78 72 [70] 
264 252 264 284 303 306 303 [286] 


In one respect the trial differs from a proper half-drill strip experiment: the 
drilling was all in one direction,* so that the inequalities noted by Gosset in 
the two halves of the drill will be eliminated from the results. 

Table I shows the total yield of each row, and of each set of four rows, in 
units of 100 g. The high yields of the third and to a lesser extent the sixth row 
of each drill width are immediately apparent. Fictitious values have been 


* This was ascertained by Neyman & Pearson (1937, p. 382). 
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inserted to complete the last drill width. Each of these is the mean of the other 
seven values in the same line of the table.* 

Differences of consecutive pairs of half-drill strips (taken in the same order 
throughout) are shown in Table IT. There is a tendency for these differences to 
be positive, which may be explained by differences between the two halves of 
the drill, referred to by Gosset. The even differences are also consistently less 
than the odd ones. This indicates the existence of some form of periodic variation 
with a period equal to two drill widths—see Note on p. 465 below. The whole 


situation is illustrated in Fig. 1, which shows a graph of the yields of each set 
of four rows. 


TABLE I 
Differences of half-drill strips 
Diffs. of Diffs. of 
Rows half strips A-—B-B+A Rows half strips A-—B-B+A 
1-8 +28 65-72 
9-16 +15 73-80 0 
17-24 +20 81-88 +14 
25-32 +6 +14 89-96 +21 
33-40 +11 97-104 +24 
41-48 +16 105-112 | +27 
49-56 +8 113-120 +23 
57-64 0 121-128 
| Total +129 +127 
| | | 


Whatever the causes of these irregularities, their effect on the results of the 
half-drill strip lay-out is disastrous. The third and sixth columns of Table IT 
show the differences A —B—B+ A for each sandwich, each of these values being 
the difference of two consecutive values in the second or fifth column. Not one 
of them is negative. If they are treated as independent, we obtain the following 
analysis of variance: 


TABLE III 
Analysis of the differences A-—B-—B+A 
D.F. Sum of squares Mean square 
Varieties 1 2016-12 2016-12 
Error 7 622-88 88-98 
Total 8 2639-00 


* These values were adopted in ignorance of the direction of drilling. 
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This gives t= 4-76, corresponding to a probability of about 0-002. In other 
words, only about one random arrangement in 500 may be expected to give 
results as discrepant as does the systematic arrangement in this trial. 


TABLE IV 
Gosset’s method of analysis 
D.F. Sum of squares Mean square 
Varieties 1 1008-06 1008-06 
Mean difference* 1 1040-06 1040-06 
Error 14 990-88 70-78 
Total 16 3039-00 


* Gosset’s fertility gradient. 


Fie1. or Four Row Means 1n Wrese's 

ROWS| 1 -16 | 17-32 | 33-48 | 49-64, 65-80 | 81-96 | 97-112\113-125 

| 


300) 
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Nor does Gosset’s proposed method of analysis help matters. This is shown 
in Table IV, and is derived from the values in the second and fifth columns of 
Table II. The values of Table III must be divided by 2 to make them comparable 
with those of Table IV. 


This gives a value of t= 3-77, corresponding to a probability of about 0-007. 
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6. FURTHER POINTS CONCERNING THE HALF-DRILL STRIP METHOD 


The failure of the half-drill strip method in the above example, though 
spectacular, might be brushed aside as exceptional. Neyman (1937), however, 
has quoted results obtained by Mr Sekar, which tend to show that the method 
is slightly less accurate on the average than is indicated by the standard error 
estimated by the method of Table IV, not, as Gosset supposed, more accurate. 

Mr Sekar worked out values of ¢ for 120 half-drill strip arrangements which 
he superimposed on different uniformity trials. (I have no particulars of what 


TABLE V 

Distribution of values of t in 120 half-drill strip arrangements 
Limits of t 0 0-4 0-8 1-2 1-6 2-0 2-4 28 32 
Nos, {Expected 360 303 219 139 S81 45 24 18 

* (Observed 32 26 20 17 10 7 3 2 
Discrepancy -43 -—-19 +19 425 +06 +0-7 
Limits of 3-2 3-6 4-0 4-4 4:8 5-2 5-6 
en Expected 0-7 0-4 0-2 0-10 0-07 003 0-06 

/ \Observed 0 1 1 0 0 1 0 
Discrepancy +14 


trials were used, but it is improbable that they were all on cereals, or that those 
that were all provided plots that coincided with the actual half-drill strips.) 

The distribution of the 120 values of ¢ is shown in Table V. 

There is a tendency ‘io cbiain too many large values of ¢, which, though not 
very marked, appears to be significant. 

Why is it that Gosset’s confident prediction of greater accuracy is not 
fulfilled, even when defects of drilling do not disturb matters? I think it is 
because the half-drill strip arrangement is, as already mentioned, really an 
arrangement in whole drill strips, and need not necessarily be expected to attain 
the full accuracy that could be obtained by using the half-drill strips in the most 
efficient manner, consistent with the requirements of randomization. It is very 
likely that randomized sandwiches, for example, are fully as accurate as 
systematic sandwiches, possibly even more accurate. 

The question is not of great practical importance, since plant breeders rarely 
want to test only two varieties, and immediately the number of varieties is 
increased comparison by pairs, under the conditions of agricultural experi- 
mentation, becomes decidedly less efficient than the use of more comprehensive 
arrangements. This point is discussed elsewhere (Yates, 1935). 
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7. THE CHESSBOARD ARRANGEMENT 


In place of Barbacki & Fisher’s random arrangements in sandwiches and 
pairs Gosset proposed a new balanced arrangement of the type shown in Fig. 2, 


4.8 8 A 
ABBAABBA 


Fig. 2. Gosset’s balanced arrangement. 


which is clearly, except at the edges, equivalent to a chessboard pattern with 
alternate squares (or rectangles) under the different varieties. Gosset claimed 
that this arrangement was likely to be more accurate, on the average, than 
Barbacki & Fisher’s random arrangements, and in this trial the actual mean 
difference between the two treatments happened to be small. The result, however, 
appears to be largely fortuitous. 

The design is of little practical importance, but it is interesting in that it 
provides a further illustration of the effect of arbitrarily splitting plots for the 
purpose of estimating the error. 

It is apparent that Gosset’s design can be regarded as made up of forty- 
eight 2 x 2 Latin squares 

A B or BA 
BA A B 


with the restriction that neighbouring squares are always of opposite type, so 
that his plots are really four times the area of a unit plot. Has this restriction 
in fact increased the accuracy over what would be obtained if the type of each 
Latin square were assigned at random? 

The question cannot be answered with certainty from the material of a 
single uniformity trial, but certain indications can be obtained. Thus of the 
arrangements shown in Fig. 3 (all made up of 2x2 Latin squares), (1) is the 


A BBA A BBA A BAB A BAB 
BAA B B AA B BABA BABA 
BAA B A BBA BABA ABAB 
A BBA BAA B A BAB BABA 
(1) (2) (3) (4) 


Fig. 3. Arrangements of 2 x 2 squares with varying degrees of balance. 


most balanced and (4) the least balanced, (2) and (3) being intermediate. 
Twelve such unit arrangements of any one type can be superimposed on the 
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192 plots constructed from Wiebe’s trial by Barbacki & Fisher. If a random 
choice is made for each of the twelve units between the arrangement and its 
complement (i.e. with A and B interchanged), an arrangement giving a valid 
estimate of error will result. If Gosset’s arguments are correct, the use of the 


unit with most balance should give the lowest error. Actually the opposite is 
the case, as is shown in Table VI.* 


TABLE VI 


Variation associated with arrangements having varying degrees of balance 


Mean square 
D.F. (units of a 
single plot) 


Arrangement (1) 12 29,120 
Arrangement (2) 12 10,399 
Arrangement (3) 12 9,207 
Arrangement (4) 12 10,915 

Mean (2 x 2 Latin squares) 48 14,910 
Randomized sandwiches 48 28,994 
Randomized pairs 96 49,106 


Arrangements (2), (3) and (4) all show less variation than arrangement (1), 
the difference between (1) and each one of the others being significant. This 
implies, inter alia, that random 2 x 2 Latin squares are likely to be more accurate 
than Gosset’s more elaborately balanced arrangement. 

The power of the Latin square in increasing the accuracy is here demon- 
strated. Neither the randomized pairs nor the randomized sandwiches considered 
by Barbacki & Fisher are anywhere near so effective. 


8. Hupson’s RESULTS 


As already mentioned, the whole of Gosset’s case in favour of systematic 
arrangements did not rest on the half-drill strip arrangement. He claimed that 
balanced arrangements of all types gave substantial gains in accuracy, and he 
put forward the results of Hudson’s examination of certain uniformity trials 
as an example of this. 

Hudson examined three uniformity trials, and his results, if taken at their 
face value, show a very considerable gain in accuracy with systematic arrange- 
ments. In Table VII (which also gives the main particulars of the arrangements 
tested) the treatment mean squares of the systematic arrangements are expressed 

* There are a few minor errors in the yields given by Barback: & Fisher (their Table I), and 


in their sums of squares, so that the values in the last two lines of Table VI are not in exact 
agreement with the values given in their analyses of variance. 
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as a percentage of the treatment+error mean squares of the corresponding 
random arrangements. In only two cases are these percentages greater than 
100, and their mean is 51-7, indicating that the random arrangements are on 
the average giving about half the information given by the systematic 
arrangements. 


TABLE VII 
Hudson’s trials 
Plots 
Trial Blocks | !te@*- S.E.% | centage 
ments per plot infor- 
Rows | Length mation 
(ft.) 
Mangolds (Mercer & Hall): 20 + 3 604 4:5 20-0 
60 rows of 3023 ft. 10 4 6 603 3:3 16-1 
Unit plots: 10 4 3 12] 3-9 46-0 
3 rows of 30} ft. 8 5 3 1514 3-8 54:8 
4 5 6 1514 31 52-3 
Sugar beet (Immer): 20 6 1 165 5-7 108-0 
60 rows of 330 ft. 10 6 2 165 5:3 129-0 
Unit plots: 10 6 1 330 4-9 58:8 
1 row of 33 ft. 4 6 5 165 5-1 9-1 
Potatoes (Kalamkar): 32 6 1 66 5-9 83-7 
96 rows of 132 ft. La 6 1 132 4-4 61-1 
| Unit plots: 66 6-2 41-0 
1 row of 22 ft. 8 | 6 2 132 5-4 25:7 
66 5:5 65-5 
| 8 66 118 4-1 


If this could be accepted as a true estimate of the average gain with 
systematic arrangements, it is clear that their advocates might make a strong 
case for their employment. The results are not very convincing, however. Only 
three uniformity trials are used, the plots in most of the arrangements are only 
one or two rows wide, and the random arrangements are in all cases randomized 
blocks. It is well known that Latin squares are in general substantially more 
accurate than randomized blocks, and examination of these systematic arrange- 
ments makes it clear that they are eliminating fertility differences in much the 
same way as do Latin squares. In any experiment in which the number of 
replicates is as great as the number of treatments one or more Latin squares 
would be the natural arrangement to adopt, and Hudson’s comparison of his 
balanced arrangements with arrangements in randomized blocks is consequently 
of little interest. Indeed the first arrangement for the mangolds is made up of 
repetitions of the special type of Latin square shown in Fig. 4, and there is no 
conceivable reason why an experimenter using a random arrangement for four 
varieties on these plots should not employ a Latin square. 
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Balanced arrangements of the type considered by Hudson may, however, 
be more effective in reducing the variance between the treatment means when 
the number of replicates is smaller than the number of treatments, since Latin 
squares cannot then be used. A possible modification of the ordinary type of 
design in randomized blocks, based on the split-plot Latin square, which may 
be of use in multiple trials, and which preserves most of the “‘balance” of 
Hudson’s arrangements, is considered in § 13. 


1 2 3 4 
+ 3 2 1 
2 1 4 3 
3 + 1 2 


Fig. 4. Hudson's systematic square. 


9. TEDIN’S INVESTIGATION 


So far as I know, the only comprehensive investigation of the precision of 
any systematic arrangement which has been published was that carried out by 
Tedin (1931). He compared the precision of the knight’s move or Knut Vik 
5x5 squares, and the diagonal 5x 5 squares, with that of randomized 5x5 
Latin squares, using ninety-one 5 x 5 squares taken from eight uniformity trials. 
No details are given as to size and shape of plots. 

The Knut Vik squares are special balanced 5 x 5 Latin squares in which the 
varieties are as evenly spaced as possible over the field. There are two such 


ABC DE ABC Ds 
DE ABE BC DEA 
C DE A B 
GC DEAS As 
Fig. 5. The Knut Vik square. Fig. 6. The diagonal square. 


squares, conjugate to each other, which may be applied to any set of plots. 
One is shown in Fig. 5. The diagonal squares are the specially simple squares, 
one of which is shown in Fig. 6. 

Expressing the treatment mean square as a fraction of the corresponding 
treatment + error mean square, Tedin obtained the results shown in Table VIII. 


TABLE VIII 


Mean relative errors of systematic and random squares 


Vik of Fig. 50-9132 +0-0599) 


-0-O18 
Conjugate square 0-9108 + 0-0622) Eo 


Square of Fig. 6 1-0496 + 0-0623 
Conjugate square 1-1176 + 0-0698 


Seven random squares (the same in each trial) 0-9651 


Diagonal squares i 1-0836 + 0-0468 
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The Knut Vik squares show a greater accuracy than expectation on random 
theory, though the gain is nothing like so striking as in Hudson’s material. 
Nevertheless the mean fraction 0-9120 is significantly less than unity, and would 
indicate that the average gain in precision (though somewhat ill determined) 
is of the order of 10°%, and is certainly less than 20%. The diagonal squares 
are, as might be expected, less precise than the random squares. 

What does this mean to the practical agronomist or plant breeder using 
5x 5 Latin squares? If he uses a random arrangement in place of a Knut Vik 
square he will in effect be allocating two or three of the twenty-five plots to the 
estimation of error. Thus he may be devoting, say, 5°%* of his resources to 
providing valid estimates of error, the elimination of unsuspected biases, and 
all the other advantages that accrue to random arrangements. The experience 
of those engaged in practical research would indicate that such an expenditure 
is entirely trivial in relation to the advantages gained. 

Tedin’s investigation applies to only one type of systematic arrangement. 
Obviously more comprehensive investigations could be undertaken, but it is 
doubtful whether they are worth while. The modern tendency in agricultural 
experiments is towards the greater use of factorial design, even in simple 
experiments involving only a few plots. Any attempt at “balancing” such 


designs would lead to the utmost confusion, and would greatly reduce the value 
of the results. 


10. WHERE RANDOM ARRANGEMENTS FAIL 


It will be apparent, on consideration of the designs discussed in the previous 
sections, that certain types of balanced systematic arrangements are in general 
likely to be more accurate than the most suitable random arrangements on the 
same plots, because it is impossible to introduce the same degree of local control 
into random arrangements while still preserving an unbiased estimate of error. 

At first sight it might be thought that some improvement on ordinary Latin 
squares and randomized blocks should be possible. Thus the 4x4 square of 
Fig. 4 possesses the property that all four treatments fall in the four 2 x 2 squares 
which go to make up the larger square, and a random selection from all the 
4x 4 squares having this property might be made. 

Unfortunately such an arrangement does not furnish a valid estimate of 
error, for though it is possible to eliminate the three degrees of freedom repre- 
senting the contrasts of these squares, thus satisfying the least square conditions 
(two of the degrees of freedom are included in rows and columns, and the 
remaining one is orthogonal to rows and columns), the resultant estimate of 
error is still biased, because the condition stated in § 2 is not fulfilled. An 
unbiased estimate can only be obtained by making two separate estimates of 


* Note that an increase of 10% in the number of plots in an experiment does not increase the 
work by 10%. 
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error, one for the contrasts (1)-(4) of Fig. 7, and the other for the contrasts 
(5)-(8). Clearly the second set of contrasts is likely in general to be less variable 


than the first. 
+ + - + + - + + - + + 
-- ++ -- + + + + + + 
t+ + - - ++ + + 
-- ++ 44+ + 4+ + = + 
(1) (2) (3) (4) 
+ + - + +> +> + + = + 
- + + + + ++ - ++ - 
- + + + + - - ++ - + + 


(5) (6) (7) (8) 


Fig. 7. Contrasts in a 4x4 square with balanced corners. 


Of the twelve possible patterns of treatments the four shown in Fig. 8 are 
such that the treatment degrees of freedom can be partitioned into two from the 
first group of contrasts and one from the second. For such patterns the partition 
of the degrees of freedom in the analysis of variance would be as in Table IX. 


(1) (2) (3) (4) 


Fig. 8. Treatment patterns for a 4x4 Latin square with balanced corners. 


In the remaining eight possible patterns the partition cannot be performed 
in this simple manner, but the expectation of the total treatment sum of squares 
is as before twice the error mean square (a) plus once the error mean square (5). 
Thus the ordinary pooled estimate of error based on 5 degrees of freedom would 


be biased. 
TABLE IX 
Analysis of variance of a 4x 4 square with equalized corners 
Rows 3 
Columns 3 
Corners 1 
First group 2 
Error (a) 2 
Second group 1 
Error (0) 3 
Total 15 


By the exclusive use of the patterns of Fig. 8, and the analysis of Table IX, 
an unbiased estimate of error could be obtained. The procedure has obvious 
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disadvantages in such a small square, but may sometimes be of value in larger 
squares, although an exact test of significance for the whole group of treatments 
will no longer be available. A similar type of arrangement, the Graeco-Latin 
square, has been proposed (Yates, 1937) for eliminating the bias inherent in the 
semi-Latin square. Split-plot and semi-Latin squares are discussed further in §13. 
Many of the modern devices, such as confounding in factorial design, and the 
quasi-factorial methods of arranging variety trials, serve a similar purpose, 
introducing a greater degree of local control than is provided by arrangements 
in ordinary randomized blocks. 

Arrangements (3) and (4) of Fig. 8 are Hudson’s balanced squares. These 
possess certain features of balance which are not possessed by squares (1) 
and (2): in particular every treatment occurs once at a corner. This emphasizes 
the inescapable fact that some sacrifice must be made in order to obtain a valid 
estimate of error, for it is only by ensuring that the component contrasts which 
make up the set of results shall be allotted with appropriate frequencies to both 
treatments and error that we can estimate the error: if one special set of 
contrasts believed to be more accurate than all the others is always allotted to 
treatments then no valid estimate of error can be possible. 


11. Errect OF BIAS IN THE ESTIMATE OF ERROR ON TESTS OF SIGNIFICANCE 


Gosset (1936,1937) has argued at some length that the biases introduced into 
the tests of significance by defective estimates of error are of little consequence, 
or indeed are an advantage, provided the estimates of error tend to be too large. 
He pointed out that if the real error is decreased, and the estimate of error 
correspondingly increased, the ultimate outcome will be that small effects will 
be judged significant less frequently than they should be, but that this will be 
compensated for by the greater frequency with which large effects are judged 
significant. Pearsor (i938) has given further illustrations of the same point. 

In the present paper the comparison between systematic and random 
arrangements has been approached from the point of view of accuracy. It is 
perhaps worth noting that the effect on the tests of significance of biases in the 
estimate of error is merely equivalent to changing the level of significance. 

Thus, for example, if in a series of experiments the estimates of error 
variance are double what they should be, the estimate x of a treatment effect 

x 
will be distributed as ¢, i.e. a/s will be distributed as ¢/,/2. With 11 degrees of 
freedom the 1% point of ¢ is 3-106, and therefore the 1% point of t/,/2 is 2-196. 
The 5% point of ¢ is 2-201, and the effect on the test of significance is therefore 
the same as would be produced by substitution of the 1°, point for the 5% 
point and the use of a correct estimate of error. 


will have an estimated variance s? which is biased by a factor 2, so that 
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Consequently no new principle is introduced by Gosset’s approach. The 
experimenter would do just as well if he admitted frankly that he believed his 
experiments to be decidedly more accurate than his estimates of error indicated, 
and allowed for this greater accuracy by the introduction of an appropriate 
factor (2 in the above example). He would then be at liberty to choose whatever 
level or levels of significance best suited his needs. Whether, of course, his choice 
of the numerical factor is even approximately correct remains in doubt: as we 
have seen, in the case of the half-drill strip arrangement, using Gosset’s method 
of calculating the error, the factor is in reality likely to be somewhat greater 
than unity. The issue would, however, be clearly defined. 

Actually the object of most agricultural experiments is the estimation of the 
magnitude of treatment effects and varietal differences, not the establishment 
of the existence of such effects, and the value of any estimates that are obtained 
is considerably increased if their standard errors are known, since fiducial 
limits may then be assigned to them. One has only to look through the 
literature of the subject to see how frequently, in the absence of such limits, 
theories are put forward which are in fact entirely untenable and merely serve 
to bring the whole of scientific agriculture into disrepute. 

On the other hand, it is of course wrong to maintain (and it has in fact never 
been maintained) that no conclusions can be reached from an experiment which 
does not provide a valid estimate of error. Such conclusions as are reached are 
less objective, and are more exposed to criticism; and many of the finer points 
that might have been elucidated, had valid estimates of error been available, 
must remain matiers of pure speculation. 


12. MULTIPLE TRIALS 


Most agricultural experiments are in fact repeated at different places and in 
a number of years, for it has long been realized that responses to fertilizers, 
differences between varieties, etc., vary substantially from year to year and 
place to place. In its fullest development this leads to multiple experiments, 
in which similar or identical trials are carried out at a considerable number of 
farms in the same year, and repeated in subsequent years. 

Since the comparison of the different trials itself furnishes an estimate of the 
variation to which the results are subject, it might be considered that no 
estimates of error are required for such trials. The estimates of error of the 
individual trials, however, are still of value. 


As an example of what is likely to occur in practice, we may consider the 
results of a set of variety trials on barley conducted in each of two years at six 
farms in the state of Minnesota and reported by Immer ef al. (1934). At each 
farm there were three replicates of each of ten varieties arranged in randomized 
blocks. The interpretation of part of the results of this set of trials has been 
discussed in detail elsewhere (Yates & Cochran, 1938). 
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The combined analysis of variance published by Immer is shown in Table X. 


TABLE X 
Analysis of variance for twelve varietal trials 

D.F. Mean square 
Places 5 3980-31 
Years 1 2541-90 
Places * years 5 1261-33 
Varieties 9 350-86 
Varieties x places 45 80-38 
Varieties x years 9 69-92 | 48-24 
Varieties x places x years 45 43-90 
Error 216 23-28 


In the above table varieties x years and varieties x places x years are not 
significantly different, and together may be taken to provide an estimate of the 
variation due to changes in weather conditions and changes of field, etc., in the 
two years. Their combined mean square, 48-24, is quite significantly above the 
error mean square. The magnitude of the variance due to these causes is 
estimated at 

1 (48-24 — 23-28) = 8-32 


for any one variety at one farm in a single year. This may be regarded as the 
effective error when we are considering the mean of a variety at any one place. 

If instead of the experiment being carried out in randomized blocks some 
form of systematic arrangement had been used, the error being estimated as if 
the arrangement were in randomized blocks, some reduction in the real experi- 
mental error variance might be expected. If this was 25%, all the mean squares 
of Table XII would be reduced by } (23-28), ie. 5-82, except the error mean 
square, which would be increased by } (23-28), i.e. 2-91. The combined inter- 
actions, varieties x years and varieties x places x years would still be significant, ~ 
but the magnitude of the additional variation would be estimated as 


(42-42 — 26-19) = 5-41, 


i.e. it would be underestimated by about 4. Set off against this is the reduction 
in the effective error variance of the final results. The effective error is reduced 
in the ratio of 48-24 to 42-42, ie. by 12%. Had the systematic arrangements 
been particularly successful and reduced the error variance by 50 °%, the effective 
error would have been reduced in the ratio of 48-24 : 36-60, ie. by 24%, but 
the apparent variance due to weather, etc., would have only one-third its true 
value, and would scarcely attain significance, since the estimated experimental 
error variance is now inflated to 29-00. 
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Thus by sacrificing the estimate of error we may succeed in increasing the 
accuracy of the final means (over a series of years) of the varietal differences 
at any one place, but not to the same extent as the increase in the accuracy of 
the individual experiments. At the same time we lose all possibility of effectively 
estimating the magnitude of the variation due to weather conditions, changes 
of field, etc. Although this does not invalidate tests of significance of the mean 
varietal differences (for which the effective error is estimated from varieties 
x years and varieties x places x years) it is a drawback which may become of 
importance as soon as the finer points of varietal improvement begin to be 
considered. Moreover the issue is further complicated by the fact that the 
degree of variation produced by changes of weather, etc., may differ with the 
different varieties, so that items in the analysis of variance such as varieties 
x years and varieties x places x years cannot always be regarded as homo- 
geneous. 

The lack of a proper estimate of error is also a serious disadvantage if it is 
necessary to increase the accuracy of the experiments, for we cannot ascertain 
what increase in accuracy in the varietal means at a place may be expected on, 
say, doubling the number of replications in each trial. In the above example, 
we can say immediately that this will halve the error mean square (working, 
now, on a two-plot basis), and will reduce all the other mean squares (on the 
average) by the same amount, i.e. 11-64. Thus the effective error will be reduced 
from 48-24 to 36-60, i.e. a reduction of 24%. With systematic arrangements 
which reduced the real experimental error variance by 50°%, however, the 
estimated experimental error variance would, as we have seen, be inflated to 
29-00, so that the estimated reduction by doubling the number of replicates 
would be 14-50, giving an expected reduction in effective error, from 36-60 to 
22-10, of 40%. The actual reduction, however, would only be 5-82, ie. 16%. 
Obviously in the case of the systematic arrangement the number of experiments 
must be increased, since there is little to be gained by increasing the number 
of replicates, but in the absence of a proper estimate of the additional variation 
due to weather, etc., the experimenter has no means of knowing this and may 
be seriously misled. 

It may be contended that in practice the estimate of error is not likely to 
be so badly wrong in systematic arrangements as was suggested in the above 
example, and that it provides an upper limit to the true error which is 
sufficiently near the true error to supply all that is really required. This will 
occur if the gain in accuracy is itself small, but in that case systematic arrange- 
ments have little advantage over random arrangements. Nor must it be for- 
gotten that the estimate may possibly be an underestimate, through some source 
of disturbance being overlooked, as in the half-drill strip arrangement. 

To sum up, systematic arrangements, when used in multiple trials, do not 
prevent valid estimates of error and tests of significance being made for the 
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more important types of difference. They do, however, fail to furnish estimates 
of the various classes of residual variation, as distinct from experimental error, 
and this prevents the most effective balance being struck between number of 
replicates in a single trial and number of trials, apart from any interest that 
attaches to these classes of variation. The loss of efficiency from this cause, and 
the slower progress made in improving experimental designs when the errors 
are unknown, may well outweigh the possible immediate gain in accuracy. 
In general it would appear better to use some type of random design, such as the 
quasi-factorial or split-plot Latin square designs, which introduce additional 
components of balance into the arrangement, while still furnishing valid 
estimates of error. The split-plot Latin square design, which is of special interest 
for simple varietal trials such as those conducted by Immer, is discussed in 
the next section. 

In any case it should be stressed that even if a single systematic arrangement 
is used it is absolutely essential to allocate the varieties or treatments at random 
to the sets of plots receiving the same treatment in this arrangement, and to do 
this afresh for every trial. Otherwise biases may be introduced and the different 
treatment comparisons will be subject to varying errors. These requirements 
are just as important if each trial consists of only a single replication. Neglect 
of this precaution will cast suspicion on any conclusions that may be drawn. 


13. VARIETY TRIALS IN SPLIT-PLOT LATIN SQUARES 


The need is sometimes felt in varietal trials carried out at a number of 
centres for arrangements for a moderate number of varieties involving three or 


| 
2 1 7 10 6 11 | 9 4 3 8 12 5 
| 
4 6 9 il 3 5 | 12 s 1 2 7 10 
3 5 8 12 10 | ee 1 4 11 6 9 


Fig. 9. Arrangement of twelve varieties in a split-plot Latin square. 


four replications only. For such trials arrangements which have as a basis a 
Latin square with split plots may be of use. 

If the varieties to be tested are divided into groups, equal in number to the 
proposed number of replications, the groups may be arranged in a Latin square, 
with randomization within the groups of plots forming the Latin square. Fig. 9 
shows such an arrangement for three replications of twelve varieties. In 
structure it consists of a 3x 3 Latin square made up of the groups of varieties 
(1, 2, 7, 10), (3, 5, 8, 12) and (4, 6, 9, 11), with randomization within each set 
of four plots. 

The analysis of variance can be conducted rigorously by subdividing it into 
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two parts, as in the ordinary split-plot design. The partition of the degrees of 
freedom is shown in Table XI. If the error mean squares (a) and (6) are combined 
in the ratio 2: 9 instead of 2: 18, as would occur if the sums of squares were 


TABLE XI 


Partition of degrees of oon in a split-plot Latin square 
Columns 
2 
Error (a) 2 


Latin square 
(sets of 4 


Within sets a 9 
varieties) 


Error (6) 18 

pooled, an unbiased estimate of the average error will result. Provided that the 
sets of four varieties are selected afresh at random for each trial the use of an 
average error for all comparisons is not likely to produce any serious disturbance 
in the analysis of a whole set of trials, though of course no exact general test of 
significance for a single trial is available. If such tests are required for a single 
trial at least four replicates in a 4 x 4 Latin square will be advisable. There will 
then be 6 degrees of freedom available for the estimation of error (a). 

It will be seen that arrangements of this type preserve the main features 
of Hudson’s balanced arrangements, while still permitting unbiased estimates 
of error to be made. As an example four such trials were superimposed on the 
potato trial reported by Kalamkar, and used by Hudson. Plots 22 ft. long and 
four rows (12 ft.) wide, with two outside rows rejected, were used. The analyses 
of variance of these four trials are given in Table XII. 


TABLE XII . 
Analysis of variance of a set of four split-plot Latin sqveres 


Mean squares 
D.F 
Ist trial 2nd trial 3rd trial 4th trial 

Latin square: 

Rows 2 7°37 34:77 4-10 50-77 

Columns 2 | 537-52 991-60 10-18 0-40 

Varieties (a) 2 | 87-21)_. 10-55) 40. 13-23 0-15 

Error (a) 2 | 735 | 4278 | | 

Total 8 | 178-90 277-98 10-84 14-37 
Within Latin square: 

Varieties (b) 9 19°28 | on 26°91) oo, 13-26) 

Error (b) 1s | 32042792 | 2390 | 1215 | 5-95 

Total 35 62-43 81-98 11-85 7-87 
Pooled: 

Varieties ll 31-63 23-94 13-25 7-38 

Error : 37-92 31-96 12-37 4-74 
Incorrectly pooled 20 35°36 27-66 12-02 4-60 

error 
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There are several points of interest in this table. The columns of the Latin 
squares (which correspond to 4x 3 blocks of plots) account for a large part of 
the variance in the first two trials, but for none of the variance in the last two. 
In the fourth trial, but not in the others, the elimination of the rows of the 
Latin square (which correspond to | x 12 blocks of plots) has been effective in 
reducing the variance. Both the total and residual mean squares are very 
different in the four trials, although they are all on the same field: this is an 
illustration of the well-known fact that field trials, even of identical pattern 
and on apparently similar areas, vary greatly in their precision, an additional 
reason for providing an estimate of error for each trial. 

The gain in precision is shown by Table XIII, which gives the varieties + error 
mean squares that would be obtained in randomized block experiments on the 
same plots, when the blocks correspond to the rows and to the columns of the 
Latin square respectively. 


TABLE XIII 


Residual mean squares for randomized blocks and split-plot Latin squares 


| | 
D.F. Ist trial | 2nd trial | 3rd trial | 4th trial 
Rows of square as blocks 33 64:55 | 84-84 12-32 5-27 | 
Columns of square as blocks | 33 33-63 | 26°85 11-95 8-33 
Split-plot Latin square ice 36-54 | 27-33 12-58 544 | 


It is clear that except in the fourth trial (which happens to be particularly 
accurate) the use of the columns of the square as blocks is about as effective 
as the split-plot design. These are in fact the most compact form of block and 
would probably be used by the experienced experimenter, but the result is 
largely fortuitous, for with plots of twice the size (4 rows x 44 ft.), and a trial 
occupying the same ground as the first two of the above trials, the residual mean 
squares are very similar in relative magnitude, having the values 271-85, 96-04 
and 104-05 respectively. In this case both forms of block are equally compact, 
but the use of the rows as blocks gives less than half the information obtained 
by the use of columns or the split-plot Latin square. 

It is clear that the more the Latin square component of error exceeds the 
other component the greater will be the inaccuracies introduced by the fact that 
the former is dependent on the two degrees of freedom only. In the first of 
these two trials the variation between the plots of the Latin square, varieties 
(a)+error (a), is substantially, but not excessively, above that within these 
plots, varieties (b) + error (5), and in the other two trials there is little difference. 
If these trials are representative of the type of variation ordinarily met with, 
it appears that the pooled estimates of error will be quite adequate for the 
purpose of estimating the error of the varietal means over a number of trials. 
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They will be somewhat less adequate for the purpose of investigating differential 
responses in the different trials, but even here little serious distortion of the 
ordinary tests is likely to result. 

The results of pooling the two estimates of error by merely summing the — 
two sums of squares are also shown in Table XII. The biases introduced will 
be apparent on comparison with the properly pooled estimates of error. These 
biases are in no case very large, but there is no advantage in using the incorrect 
estimate other than a slight saving in computational labour. 

An obvious refinement in statistical treatment is to provide two estimates 
of error for each experiment, one for the comparison of varieties falling in the 
same group, and the other for varieties falling in different groups. The first is 
derived directly from error (5), and the second the mean of error (a) and error (6), 
weighted in the ratio 1:3. This, however, somewhat complicates the pre- 
sentation of the results, and may not be worth while. 

The split-plot Latin square only differs from the so-called semi-Latin square 
(originally suggested by Gosset under the name of “equalized randomized 
blocks”’, and’ independently put forward by Pitman of Tasmania, to whom the 
name semi-Latin square is due) in that the same groups of varieties are used 
for each of the Latin square plots. If this restriction is removed it is impossible 
to divide the analysis of variance into two parts (unless the number of replicates 
is sufficiently great for a Graeco-Latin square (Yates, 1937) to be used), and the 
resultant estimate of error is consequently biased to the extent indicated by 
the last line of Table XII. There is, however, the compensating advantage that 
the comparisons between the different varieties, and between means of groups 
of varieties, vary less in precision than in the case of the split-plot Latin square. 
Thus twelve varieties can be arranged in a 3 x 3 square so that within the Latin 
square plots each variety occurs with one other variety three times, with six other 
varieties twice, and with four other varieties once, the variances of the differences 
being 34’, 3 (4H +24’) and 3 (14+ respectively, where and £’ are the ex- 
pectations of the error mean squares (a) and (6). If a split-plot Latin square 
is used there will be three comparisons of the first type and eight of the last for 
each variety. The mean variance, as before, is equal to 3 (7,4+%,2’), whereas 
the estimate given by the analysis of variance is } (75# + 2’). 

In conclusion it should be emphasized that the gain in precision obtained 
in this example should not be taken as necessarily representative of the average 
gain likely to accrue under all circumstances. Much obviously depends on the 
shape of plot, type of crop, and other factors. A comprehensive investigation 
covering a representative sample of existing uniformity trials must be under- 
taken before it can be decided whether the gain in precision is sufficient to 
outweigh the statistical defects of the design. The catalogue of uniformity 
trials published by Cochran (1937) is likely to facilitate the selection of material 
suitable for such an investigation. 
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14. SUMMARY 


The recent claims advanced in favour of systematic arrangements by Gosset 
(““Student”’) and others are examined. The conclusion is reached that in cases 
where Latin square designs can be used, and in many cases where randomized 
blocks have to be employed, the gain in accuracy with systematic arrangements 
is not likely to be sufficiently great to outweigh the disadvantages to which 
systematic designs are subject. In particular the available evidence, though 
not conclusive, indicates that the half-drill strip arrangement, which Gosset 
particularly favoured, is likely to be somewhat less accurate than suitable 
random arrangements occupying the same plots. On the other hand, systematic 
arrangements may in certain cases give decidedly greater accuracy than 
randomized blocks, but it appears that in such cases the use of the modern 
devices of confounding, quasi-factorial designs, or split-plot Latin squares which 
are much more satisfactory statistically, are likely to give a similar gain in 
accuracy. 

As an example the uniformity trial chosen by Barbacki & Fisher to demon- 
strate the defects of the half-drill strip arrangement is re-examined. It is shown 
that Gosset’s criticisms of Barbacki & Fisher’s work, though at first sight 
convincing, are not as conclusive as he supposed, and that in fact this particular 
trial provides a striking example of just those defects which have always been 
attributed to the half-drill strip method by its critics. 
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Added Note on Wiebe’s Uniformity Trial 


As a result of correspondence between Prof. Pearson and Dr Wiebe, which 
the former has kindly passed on to me, it is possible to offer an explanation of the 
periodic variations in yield which led to the highly significant result in the half- 
drill strip arrangement discussed in § 5. 

It will be apparent that slight errors in directing the drill will produce 
unequal spacing between the last row of one drill width and the first row of the 
next, a point I overlooked in my discussion. If these errors are randomly distri- 
buted this will merely result in some increase of experimental error, but if they 
alternate in sign one variety will be favoured at the expense of the other, and a 
bias will result. This has occurred in the present trial. 

The intended distance between each pair of drill rows was 12 in. (8 ft. 
between drill strips). The actual distances between the neighbouring rows of 


consecutive drill strips (averages of thirty-six measurements), supplied by Wiebe, 
are as follows: 


| Distance | Distance | Distance 
Drill strips | in. Drill strips | in. Drill strips | in. 

land2 | 10-2 6and 7 14-2 lland12 | 11-2 
2and3 | 12-4 Tand 8 11-8 l2Zand13 | 140 
3and4 | = 11:7 Sand 9 13-8 Wandl4 | 113 
4and5 | 13-4 9 and 10 12-2 14 and 15 12-9 
5 and 6 10-6 10 and 11 13-1 ldand16 | 124 


Wiebe (1937) determined the effect of increase or decrease of the spacing 
between drills on the yield of the edge rows of each drill strip. He found that 
an increase of 1 in. over the normal spacing increased the yield of each of the 
neighbouring rows by 258 g. This, as might be expected, is slightly less than the 
value, 293 g., given by assuming that the yield of a row is directly proportional 
to the available area. Adjusting the figures of Table I, we obtain the values 
for the differences A -B— B+ A shown on the next page. 

The value of ¢ is reduced to 1-80, so that a good deal of the original excess of 
A over B can be attributed to drilling rather than to periodic variations in fertility. 

t may be noted, however, that had the centre two rows of each half-drill strip 
only been retained (as might reasonably be done in an actual trial in order to 


| | 
| 
| 
| | 

| | 
| 


466 Systematic and Randomized Arrangements in Experiments 


eliminate both competition effects and irregularities of drilling), the value of ¢ 
would still have been 2-32 (5° point = 2-36). 


Original | Adjusted Original | Adjusted | 
values values values values | 
+13 | +3 0 iG 
| $4. +8 +21 | +9 
+16 0 +27 +16 
+8 +28 +24 | 


The fact that the disturbance in the original results is due to a systematic 
error in drilling, and not to a periodic fertility wave, does not of course affect 
the general issue. Indeed it serves to emphasize the numerous possibilities of 
bias which are always present in systematic arrangements. Had the arrange- 
ment been a random one a systematic error of this kind would have produced no 
harmful results. 

On the other hand it should be stated, in fairness to Gosset, that as a result 
of inspections of Dr Beaven’s trials, he became aware of this particular source of 
bias, and drew attention to it in an addendum to his 1923 paper, where he stated 
that measurements on the stubble “showed not only that such inaccuracies 
occur, but also that they can favour one of the varieties’’, and added that such 
measurements were customarily being made to correct for this. The alternative 
method of rejecting certain rows entirely is probably in more common use, but it 
is to be noted that many descriptions of the half-drill strip method do not 
mention the matter, which can hardly have been regarded as a serious source of 
bias. 
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MISCELLANEA 
(i) A Correction to ‘‘A Generalization of Fisher's z test’’ 
By D. N. LAWLEY 


I wisu to correct and apologize for an error in my paper in the current volume of Biometrika 
(30, 180-7). The derivation of the distribution of v* in § 2 is unfortunately wrong, and thus 
the quantity 
Ng p|A’|! 
does not, as supposed, follow Fisher’s z distribution except when n, = 1 or as an approxima- 
tion when ng is large. 
If the distribution obtained for v? were correct, then the quantity u = n,v*/n, would be 
distributed as y,?/y,", where vy," has n,p degrees of freedom and x," has (n, — p+ 1). Weshould 
then have 


Sher) 
(n»—p—1)’ 
and )= —p—1) 
Thus o,? = E(u?) —{E(u)}* 
{ (n,p+2) 


~ (N_—-p—1) —p—3) (n»—p—1)j* 
In actual fact the distribution of uw is somewhat more complicated in form, and I have 
been unable to obtain an explicit expression for it. It does, however, approximate to the 
distribution of y,°/.2 when n, is large, and we can obtain some idea of the nature of the 
approximation by finding the true mean and variance and comparing them with the values 
given above. 
Using the notation adopted before we have for the moment-generating function of u 
M(t) = E(e“*) 


in, 


We may suppose without loss of generality that the variances and covariances of the 
distribution are all unity and zero respectively, i.e. that c;; = 4,; (where 3,; = 0 when i+j, 
and = 1 when i=7). Then since 


we shall have 


bi 
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a’,, bn 
Therefore Mit) = || | 1 


= E{1 —7(2t) 


where | A’ | r=sum of all principal minors of | A’ | of order (p—1), and | A’ | s=sum 
of all principal minors of | A’ | of order (p—2). Thus 


2 


M(t)=1+ 4, pit’ 


where My’ = $n, x 2E(r) = ny pE{A’,,/| A’ |} 
~ (%—p—1)’ 
9 
putting P = Xr"), = E(s). 
Now when n, = | it is known that 
Hence from (1) (2) 
But it may easily be proved that 
1 
Q = 40(p—1) (3) 


—p) 
Therefore from (1), (2) and (3) we have 


9». 
Hence = — = 0,27 — 


—p) 


2 


It will be seen that although y,’ = H(u) = H(x,°/x.”) the variance o? of wu is less than that of 
by an amount 
(Ny — p) (N%—p—1) (ng—p—3)’ 
and that the proportionate error in the variance is 
of 


o2 


2 


This would seem to indicate that if (n,;—1)(p—1)/n, is fairly small the error made by 
supposing u to be distributed as y,?/x." will not be very serious, and hence the x% point 
of Fisher’s z with degrees of freedom N, =n,p and N, = (n.—p + 1) will be an approximation 
to the x % point of Z (defined as before on p. above). 

A better approximation may be obtained by supposing u to be distributed as the ratio 
of two x’, but altering the degrees of freedom so that the mean and variance of u have the 


correct values. Then y,? and x,” have degrees of freedom N,’ and N,’ respectively, where 
N,’ = nearest integer to {1 +(n,—1) (p—1)/ng} nyp 
and N,’ = nearest integer to {1+(n,—1) (p—1)/ng} (n»—p+t 1). 


If we now find the x % point of a z having degrees of freedom N,’ and N,’ it will be 
a further approximation to the « % point of Z. 
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To illustrate this consider the numerical example which I gave. 


We had p=2, n=5 
and (m,— 1) (p—1)/ng = 2/15. 
Thus N,’=11, N,’=33. 


The 0-1 % point of z with degrees of freedom 11 and 33 is 0-687 (approximately). 

This is a better approximation to the 0-1 % point of Z than the value 0-728 previousiy 
obtained py taking degrees of freedom 10 and 29. 

It will be noted that the significance of the value of Z obtained from the sample, 
i.e. Z = 1-0026, is still further increased. 


(ii) Twenty-Five Years of Health Progress. By L. I. Dusir and A. J. Lorxa. 
New York: Metropolitan Life Insurance Company. 


THE Metropolitan Life Insurance Company of New York, the largest of its kind in the 
world, has just completed a mortality investigation of great social importance. During the 
quarter century from 1911 to 1935 the weekly premium-paying policyholders of the com- 
pany, growing from eight to seventeen millions, contributed over three hundred and 
forty-six million years of life exposed to risk of death between the ages of 1 and 74; of these 
lives assured, 3,200,000 died. Such gigantic figures, collected and analysed with precision 
cannot fail to contain facts of great importance to all concerned in that important task, 
the lengthening of human life. 

The figures relate to a fairly representative sample of the urban dwellers of the United 
States, and their interpretation is not immediately applicable to any assigned section of 
the population of England or, in particular, to the policyholders of large industrial assurance 
companies operating in this country. Nevertheless the broad classification of the changes 
in the relative importance of the various causes of death in the twenty-five year period 
under review, and the suggested explanations of such changes, are probably similar to the 
results which would be obtained in an English experience. Apart from the question of their 
applicability to conditions over here, the results of the initiative of the Metropolitan Life 
are of extraordinary interest and value, partly because they are based on such large numbers 
exposed to risk, partly because they are interpreted so skilfully, but mainly because they 
demonstrate plainly the effect of a modern environment on a body of lives resident in a 
civilized and progressive state. 

The importance of this effect may be appreciated when it is stated that the average 
lifetime has been extended, during the period considered, by nearly fourteen years; or, 
stated another way, the standardized death rate (based on the 1901 ‘‘standard million” 
of England and Wales) has fallen from 1355 in 1911 to 763 per 100,000 in 1935—a decrease 
of nearly 44% on the 1911 figure. The mode of this decrease is of such interest that, by 
kind permission of the Metropolitan Life Insurance Company, a graphical representation 
of the course of the standardized annual death rate per 100,000 between 1911 and 1935 is 
reproduced below. The curve may be analysed into three sections, 1911 to 1917, 1917 to 
1921 and 1921 to 1935, the first and last with fairly uniform and almost parallel downward 
slopes, the second, a period of sudden and arbitrary changes. The trend line of the first 
period has been continued (dotted) up to 1935 and shows that immediately after the 
influenza pandemic there occurred an improvement in mortality which has advanced the 
curve of death rates by about thirteen years. The apparent explanation that the post- 
pandemic decline was due to the extermination, by influenza or its concomitants, of a large 
number of chronically invalid persons will not bear inspection, for the heavy death rate in 
1918 was due to an increase in the number of deaths of young and middle-aged persons, the 
former being free from the degenerative diseases. The authors incline to the view that the 
sudden change in the level of mortality after 1918 may be due to the alteration in the 
bacteriological environment caused by the influenza epidemic. 
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After the opening chapters on the trend of longevity and the general mortality from all 
causes, the principal individual causes of death are dealt with, more or less in the order 
in which they appear in the International List of Causes of Death. In each case age, sex 


Aut Causes or DEATH 
Standardized Annual Death Rates per 100,000 
Total Persons. Ages 1 to 74 Years 


Metropolitan Life Insurance Company, Industrial Dept., 1911 to 1935 
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and colour (negro or white) are differentiated and the trend of the mortality rates is con- 
sidered. The data and their interpretation are so rich and manifold in their implications 
that it would be supererogatory to comment upon them in detail. The extraordinary contro! 
of diphtheria accomplished in New York city, the striking decline in mortality from tuber- 
culosis, the exaggerated, but commonly held, views upon the increase in cancer mortality, 
the desirability of further improvement in the mortality from the cardiovascular-renal 
diseases at the middle ages of life, the “‘wholesale slaughter’? caused by automobile 
accidents, all these and many other things of more than passing interest find their place 
in this epic account of the amelioration in mortality rates produced by the advances in 
medical science and the spread of the public health movement. That the Metropolitan 
Life has played an important part in educating the American public to its duty in matters 
of health is indicated at many points of this book; that it intends to continue the education 
of its increasing circle of lives assured augurs well for the future of the health of the 
American people. 
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(iii) Note on Professor Pitman’s contribution to the theory of estimation 
By E. 8. PEARSON 


Ir is to be hoped that Prof. Pitman’s interesting contribution, published on pp. 391-421 
above, will later give rise to further discussion on the problem of estimation in this Journal or 
elsewhere. There is one point on which I should, however, like to make a brief comment now. 

In the footnote to p. 392 Prof. Pitman suggests that his paper will show that Fisher’s theory 
| of fiducial probability and Neyman’s theory of confidence intervals are essentially the same. 
That they are closely related, and that in very many practical cases they will lead to precisely 
the same form of procedure, is evident. Nevertheless, I feel that there are certain differences 
in the initial approach which at the present stage of development of the theory of interval 
estimation it is important to keep clear, since otherwise apparent disagreement arising at 
a later stage may lead to unnecessary misunderstanding. 

I believe I am correct insaying that the following has been Prof. Neyman’sline of approach 
| to the subject. He has considered the basis of a general procedure which will provide rules 
for obtaining from observed data an interval that will cover the unknown parameter with a 
given probability. The probability is associated with repeated employment of that particular 
rule or method and thus if, for a specified sample of m observations, it happens that two rules 
lead to the same interval but associate with it different probabilities, there is no incon- 


sistency. For example, if x,,%2, ...,2,, be a random sample of m observations from a normal 
) opulation, and 
PoP = 2 (x,—Z)*/n, 


w = range = largest x — smallest x, 


it is possible to determine the multipliers a and (approximately) b so as to make the following 
statements about the unknown standard deviation o in the sampled population: 


<0 <ays, probability of being correct,0-99;  —...... (A) 


b,w<o<b,w, probability of being correct,0-98. =... (B) 


It is then possible, if unlikely,* that the configuration of the sample 2z’s will be such that 
a,8=b,w, 4,8=b,w, so that two different probability statements are associated with the 
same interval. Following Neyman’s approach, there is no inconsistency in this result, since 
one probability is associated with the employment of the s-rule, the other with the w-rule. 
It is only when we try to divorce the probability measure from the rule and to regard 
the former as something associated with a particular interval, that the need for a unique 
probability measure seems to be felt. It is such a measure, no doubt, that Fisher would 
define as a fiducial probability. 
The following quotation from the Appendix of Neyman’s paper on “Aspects of the 
representative method’’ (1934, p. 624) will illustrate this idea further. He was discussing 
the prediction of limits for the unknown proportion, p, of black balls in a bag after X black 
balls had appeared in a randomly drawn sample of three balls, and wrote: 
**Having noticed this, we fix a rule as follows: 
“Tf in the sample which we shall draw, X will have the value 


X = 0 then we shall state that 0<p<aij, 


” ” 735 


* This will mean, in the first place, that the particular a@ and 6 factors chosen are such that 
a,/a, = b,/b,, and then that the sampie is one in which w/s = a,/b,. 


‘ 
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We are aware that the statement which we shall make, in applying this rule to the result of 
actual sampling, may be wrong or may be true. We calculate the probability, P, that the 
statement will be a true one, and try to arrange the system of values of the 7’s so as to have 
P>0-95.... Making statements following the rules set out above, we know something im- 
portant about the results of these statements: the probability that we shall be wrong is then 
<0-05.” 

We may usefully compare this statement with one from Prof. Pitman’s present paper. 
Thus he writes (p. 396): 

“The statement ael(Z,, ...5%_) (4) 


is a variable statement which is a function of 2}, ...,#,. When particular, actually observed 
values of x,, ...,#, are inserted in it, we obtain a definite statement about the unknown para- 
meter a that is either true or false, and we shall not know which it is; but we do know that 
the probability that the variable statement (4), when used in this way, will give a true par- 
ticular statement about a is a (supposed constant).... If we decide upon @, say 0-95, and then 
define J accordingly, we shall have a rule for automatically making a definite statement 
about the unknown parameter a whenever a set of values of the chance variable X is observed. 
A statistician using this rule can expect to be right about 95 times out of 100.” 

The correspondence between these two descriptions of the meaning of the probability 
statements associated with a confidence or fiducial intervai is clear. The essential point of 
this agreement is that the probability of 0-95 is not the probability that the parameter 
estimated lies between any fixed limits but that a variable statement about this parameter, 
made acvording to a specified rule, will be correct. Having started with this common inter- 
pretation of the probability statement associated with an interval, the further steps taken 
by Neyman and Pitman diverge. The difference is exemplified by a sentence which I have 
omitted from the quotation from Pitman’s paper: 

““As R. A. Fisher expresses it, the fiducial probability of the variable statement 


ae 
is a.” 
Now Fisher (1935, 1936) has emphasized that if a sufficient estimate of the unknown 
parameter a exists, a fiducial statement can only be made in terms of this estimate, on the 
grounds that it alone contains the whole of the available “‘information”’. When there is no 
sufficient estimate he has suggested (1936, pp. 256-7) another possible line of attack. It is 
this suggestion, involving the use of the sampling distribution of an estimate within samples 
having a given “configuration’’, which Pitman has followed out. It involves what is essen- 
tially a different method from Neyman’s of choice between possible rules for determining an 
interval from given data. 

It will be noted that when Prof. Pitman comes to apply his theory to the case of the 
normal distribution (pp. 406-8), all his fiducial statements regarding the unknown popula- 
tion standard deviation are expressed in terms of S = Y(2?), that is to say, in terms of the 
sufficient statistic. Neyman’s approach involves no initial limitation of this kind; as stated 
above, the interval could be defined in terms of the sample range. If the confidence limits 
which he accepts finally for the unknown variance are also expressed in terms of S, he has 
arrived at this result by a different route. Further, it is a route which, when there is no 
sufficient statistic, it can be shown will not always lead to the same solution as Pitman’s. 

It is not difficult to see just where this divergence, after initial agreement, has occurred. 
Neyman (1937) has shown that any system of confidence intervals is equivalent to some 
system of “‘regions of acceptance’’. Consequently, when making a choice out of an unlimited 
set of regions of acceptance so as to satisfy a maximum criterion as described below, he is 
sure of obtaining the absolute maximum. On the other hand, if I understand Prof. Pitman 
correctly, a restriction is placed at an early stage on the form of his regions of acceptance 
(p. 395); these are composed of intervals I’ from the lines Z which are to be such that 
P{I’ | L} = « and is constant for every L (p. 394). Samples represented by points on a given L 
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all have what has been termed by Fisher the same configuration. In introducing this re- 
striction Pitman is following Fisher’s approach and not Neyman’s. 

It may be useful if I conclude with a brief description of what I have referred to as 
Neyman’s maximum criterion. Having established the procedure leading to the association 
of a probability statement with a specified rule for determining an interval, it becomes — 
necessary from the practical point of view to make a choice between alternative rules. Here, 
Neyman would say, there can be no question of an absolute right or wrong. All that can be 
done is to suggest a principle or principles, the following of which appears to have a strong 
intuitional appeal; to base an appeal on the consequences that will follow from the con- 
tinued application of a given rule is a procedure which has been accepted as intuitionally 
sound by the human mind. 

In Neyman’s view, in the example of the normal curve given above, to say that because 
S = 2(2;—Z)* and & are jaintly sufficient statistics with regard to the population standard 


deviation and mean, o and &, therefore the statement (A) is to be preferred to (B), is not by 
itself an argument with direct enough appeal to be convincing. To say that S contains the 
whole of the relevant information about o does not provide an answer until we have been 
able to define just what is the nature of the information that we hope to obtain. In any 
ease the principle of sufficiency could not be enough to determine the most appropriate 
interval: 

(1) It will not suffice if there is no sufficient statistic. This suggests that the general 
principles of choice should lie somewhat deeper; they may result in the choice of a sufficient 
statistic when it exists, but this is a secondary result, not a primary reason. 

(2) Even when it has been decided to base ths rule on a sufficient statistic, we are still 
left in doubt as to how to select, e.g. in equation (A), from the infinite set of pairs of factors 
a, and 4g, with all of which the same probability will be associated. Again some deeper basis 
of choice is needed. 

(3) There might well be problems in which, even when a sufficient statistic exists, che 
use of a rule based on some other function of the observations would have a stronger appeal. 
E.g. speed in calculation, inadequacy of recorded data, etc. At any rate, it is desirable to keep 
an open mind on such points, and allow elasticity in method. 

Suppose that in the simple case of the interval estimation of a single parameter 6, we 
write a statement in the following form: 

“There is a probability of a that, in following a specified rule for calculating from the 
data the limits 7, and 7, this statement is true: 


T,<0<T,.” 


Under the heading of ‘‘consequences of applying the rule’’, would come information on 
such points as: 

(1) The distribution of the length of interval T,— 7, in sampling from a population 
wiin @ fixed. 

(2) The probability that values of @ differing from the true value 6, are included in the 
interval, 

Neyman has suggested that the selection of the appropriate rule should be based in some 
way on a consideration of (2), on the grounds that an objective with a simple intuitional 
appeal is the following: 

“Tf 0, is the true value of the parameter in the sampled population, and 6, some other 
value not equal to 6, then it is desirable to make the chance that the interval includes 6, 
decrease as rapidly as possible as | 0, —@,| increases.” 

This approach links up with that from which Neyman and I have attacked the problem 
of testing statistical hypotheses, but its justification does not rest on the fact that it is so 
related, but rather on what I have termed its intuitional appeal. In so far as it leads to the 
choice of an interval based on a sufficient statistic if one exists, that is valuable knowledge. 
Our point of view has, however, been that no property of mathematical functions can 
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be accepted as the primary reason for choice of method, because such properties can 
hardly supply the practical experimenter with really satisfying reasons for a choice 
between alternatives. And if the object of the mathematical statistician is to provide tools 
for practical use, it seems important that the connexion between the abstract and the 
perceptual should be expressible in terms of the simplest possible probability concepts. 
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